Package list tigr-glimmer / 6d2f66d
Restructured docs, added docs from Glimmer2 that might be helpful for users Andreas Tille 13 years ago
25 changed file(s) with 1782 addition(s) and 1125 deletion(s). Raw diff Collapse all Expand all
00 docs/notes.pdf
1 debian/glimmer2_docs
0 #!/bin/sh
1
2 BINDIR=/usr/lib/tigr-glimmer
3
4 if [ $# -lt 1 ] ; then
5 echo "Usage: $0 <program>" 1>&2
6 echo " Existing programs are:"
7 ls ${BINDIR}
8 exit 1
9 fi
10
11 WRAPPER=$0
12 PROGRAM=$1
13 shift
14 ARGS=$*
15
16 if [ -x ${BINDIR}/${PROGRAM} ]; then
17 exec ${BINDIR}/${PROGRAM} ${ARGS}
18 else
19 echo "Usage: ${PROGRAM} does not exist in Tigr Glimmer"
20 echo " Existing programs are:"
21 ls ${BINDIR}
22 exit 1
23 fi
0 .TH GLIMMER 1 "April 16, 2008"
1 .SH NAME
2 glimmer \- runs various programs of the TIGR Glimmer suite
3 .SH SYNOPSIS
4 .B glimmer
5 .B program
6 [arguments]
7 .SH DESCRIPTION
8 This manual page documents briefly the
9 .B glimmer
10 wrapper to the TIGR Glimmer programs.
11 This manual page was written for the Debian GNU/Linux distribution
12 because upstream does not provide this wrapper and it was invented
13 for Debian to avoid conflicts with other packages that might cause
14 a name space polution.
15 .PP
16 \fBglimmer\fP is just a wrapper that invokes the various programs in the
17 TIGR Glimmer software package. You can get more detailed documentation
18 in /usr/share/doc/tigr-glimmer. Please note that the documentation there
19 is a part of the former version Glimmer 2. The version Glimmer 3 has
20 some features that were described in the notes.pdf document inside
21 the documentation directory.
22 .PP
23 The following programs are included: anomaly, build-fixed, build-icm,
24 entropy-profile, entropy-score, extract, glimmer3, long-orfs, multi-extract,
25 score-fixed, start-codon-distrib, test, uncovered and window-acgt.
26 .SH OPTIONS
27 There are no options.
28 .SH EXAMPLES
29 .IP glimmer\ build-icm
30 .IP glimmer\ long-orfs
31 .SH SEE ALSO
32 For the pre previously packaged version Glimmer2 some text files from
33 the documentation were turned to man pages for the Debian GNU/Linux
34 distribution by Steffen Moeller <moeller@debian.org>
35 .BR treetool (1),
36 .br
37 .SH AUTHOR
38 This manual page was written by Stephane Bortzmeyer <bortzmeyer@debian.org>
39 and Dr. Guenter Bechly <gbechly@debian.org>, for the Debian GNU/Linux system
40 (but may be used by others).
0 This file and all files in this release of the Glimmer system are
1 copyright (c) 1999 and (c) 2000 by Arthur Delcher, Steven Salzberg,
2 Simon Kasif, and Owen White. All rights reserved. Redistribution
3 is not permitted without the express written permission of
4 the authors.
5
6 Glimmer 2.0 is described in:
7 A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg.
8 Improved Microbial Gene Identification with Glimmer.
9 Nucleic Acids Research, 27 (1999), 4636-4641.
10 Please reference this paper if you use the system as part of any
11 published research. Note that Glimmer 1.0 is described in
12 S. Salzberg, A. Delcher, S. Kasif, and O. White.
13 Microbial Gene Identification using Interpolated Markov Models.
14 Nucleic Acids Research, 26:2 (1998), 544-548.
15
16 Quickstart: if you just want to run Glimmer 2.0 on your genome
17 and you don't want to adjust any parameters (although we don't
18 recommend this), you can simply compile this system and run
19 it with the included run-glimmer2 script. E.g.:
20 unix-prompt> make
21 [various compilation messages appear]
22 unix-prompt> run-glimmer2 mygenome
23
24 run-glimmer2 will create an Interpolated Markov Model of your genome
25 and store it in a binary file called tmp.model. It will store
26 the predicted gene coordinates in g2.coord. Along the way
27 it will extract long ORFs and store them and their coordinates
28 in tmp.train and tmp.coord.
29
30 Recommended: read the readmes.
31
32 Glimmer 1.0 had 4 readme files, and Glimmer 2.0 maintains that
33 structure. The four main programs are:
34 1. long-orfs
35 2. extract
36 3. build-icm
37 4. glimmer2
38 There are files called *.readme for each of these programs. Please
39 read these first before emailing the authors with any questions.
40
41 Art Delcher, adelcher@tigr.org, was the primary programmer for
42 most of the Glimmer 2.0 code, and he can answer most technical
43 questions.
44
45 CHANGELOG, 7/31/00:
46 - Weak scores are now only invoked with the -w option. Any weak-score
47 gene is rejected automatically by an overlap with a regular gene.
48 - Weak-scores genes and "voted" genes are now annotated by [Weak] and
49 [Vote] in the final listing. Voted genes are those which have a
50 significant number of relatively high-scoring subregions. Voted
51 genes also are rejected automatically by overlaps with regular genes.
52 - Weak scores are computed to be more independent of architecture-dependent
53 floating-point features. (Previously, 64-bit machines would sometimes
54 generate different results from 32-bit machines.)
55 - Fixed bug in RNABin function that occurred when the gene
56 started on the very last base of the genome. This function is
57 now not called at all if the Choose_First_Start_Codon option is
58 selected (which is the default).
59 - Fixed problem that occurred on short pieces of genome when one
60 frame (or more) had no stop codons.
61 - An ignore option (-i) to specify a list of regions in which no predictions
62 will be made, such as ribosomal RNAs. This feature has not yet been
63 thoroughly tested.
64
65 CHANGELOG, 9 December 2002
66 - Raw scores are now printed in the main listing and in []'s in
67 the final list of putatative genes
68 - Add +S option to us a "stricter" independent (intergenic) model
69 that discounts stop codons. Since only orfs (which have no stop
70 codons) are ever scored, the independent model is at a disadvantage
71 unless it also assumes that it is only scoring orfs. Thus, with the
72 +S option, the independent score is done codon by codon.
73 The probabilities of codons are intially set to what the
74 previous independent model would be:
75 The probability of a codon "atg", for example is:
76 Pr[a] * Pr[t] * Pr[g]
77 Then each of these is divide by the sum of the probabilities of the
78 non-stop codons.
79 - Add -L option to specify the name of a file containing a list
80 of coordinates. The genes in these lists are scored separately by
81 the ICM, output, and then the program stops (i.e., no
82 overlapping/voting rules).
83
84 CHANGELOG, 5 February 2003
85 - The strict independent (intergenic) model is now the only mode.
86 The +S option is tolerated but has no effect.
87
88 CHANGELOG, 18 April 2003
89 - Compute the optimal length for minimum "long" orfs, so that the
90 program will return the largest number of orfs possible. The -g
91 switch still works if specified, but I don't know why anyone would
92 want to use that for a training set.
93 - Change minimum overlap by default to be 0. This means that genes
94 that overlap even by 1 base will be considered in conflict by Glimmer,
95 and the program will try to adjust their start codons to remove the
96 conflict or else delete one of the genes.
97
98 CHANGELOG, 7 October 2003
99 - Fix bug on long-orfs.cc to avoid occasional array out-of-bounds
100 error (detected on Mac OS X).
0 // Copyright (c) 1997-99 by Arthur Delcher, Steven Salzberg, Simon
1 // Kasif, and Owen White. All rights reserved. Redistribution
2 // is not permitted without the express written permission of
3 // the authors.
4
5 Program build-icm.c creates and outputs an interpolated Markov
6 model (IMM) as described in the paper
7 A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg.
8 Improved Microbial Gene Identification with Glimmer.
9 Nucleic Acids Research, 1999, in press.
10 Please reference this paper if you use the system as part of any
11 published research.
12
13 Input comes from the file named on the command-line. Format should be
14 one string per line. Each line has an ID string followed by white space
15 followed by the sequence itself. The script run-glimmer2 generates
16 an input file in the correct format using the 'extract' program.
17
18 The IMM is constructed as follows: For a given context, say
19 acgtta, we want to estimate the probability distribution of the
20 next character. We shall do this as a linear combination of the
21 observed probability distributions for this context and all of
22 its suffixes, i.e., cgtta, gtta, tta, ta, a and empty. By
23 observed distributions I mean the counts of the number of
24 occurrences of these strings in the training set. The linear
25 combination is determined by a set of probabilities, lambda, one
26 for each context string. For context acgtta the linear combination
27 coefficients are:
28 lambda (acgtta)
29 (1 - lambda (acgtta)) x lambda (cgtta)
30 (1 - lambda (acgtta)) x (1 - lambda (cgtta)) x lambda (gtta)
31 (1 - lambda (acgtta)) x (1 - lambda (cgtta)) x (1 - lambda (gtta)) x lambda (tta)
32 :
33 (1 - lambda (acgtta)) x (1 - lambda (cgtta)) x (1 - lambda (gtta))
34 x (1 - lambda (tta)) x (1 - lambda (ta)) x (1 - lambda (a))
35
36 We compute the lambda values for each context as follows:
37 - If the number of observations in the training set is >= the constant
38 SAMPLE_SIZE_BOUND, the lambda for that context is 1.0
39 - Otherwise, do a chi-square test on the observations for this context
40 compared to the distribution predicted for the one-character shorter
41 suffix context.
42 If the chi-square significance < 0.5, set the lambda for this context to 0.0
43 Otherwise set the lambda for this context to:
44 (chi-square significance) x (# observations) / SAMPLE_WEIGHT
45
46 To compile the program:
47
48 g++ build-icm.c -lm -o build-icm
49
50 Uses include files delcher.h context.h strarray.h gene.h
51
52 To run the program:
53
54 build-icm <train.seq >train.model
55
56 This will use the training data in train.seq to produce the file
57 train.model, containing your IMM.
58
59
0 // Copyright (c) 1997 by Arthur Delcher, Steven Salzberg, Simon
1 // Kasif, and Owen White. All rights reserved. Redistribution
2 // is not permitted without the express written permission of
3 // the authors.
4
5 Program extract takes a FASTA format sequence file and a file
6 with a list of start/stop positions in that file (e.g., as produced
7 by the long-orfs program) and extracts and outputs the
8 specified sequences.
9
10 The first command-line argument is the name of the sequence file,
11 which must be in FASTA format.
12
13 The second command-line argument is the name of the coordinate file.
14 It must contain a list of pairs of positions in the first file, one
15 per line. The format of each entry is:
16 <IDstring> <start position> <stop position>
17 This file should contain no other information, so if you're using
18 the output of glimmer or long-orfs , you'll have to cut off
19 header lines.
20
21 The output of the program goes to the standard output and has one
22 line for each line in the coordinate file. Each line contains
23 the IDstring , followed by white space, followed by the substring
24 of the sequence file specified by the coordinate pair. Specifically,
25 the substring starts at the first position of the pair and ends at
26 the second position (inclusive). If the first position is bigger
27 than the second, then the DNA reverse complement of each position
28 is generated. Start/stop pairs that "wrap around" the end of the
29 genome are allowed.
30
31 There are two optional command-line arguments:
32
33 -skip makes the output omit the first 3 characters of each sequence,
34 i.e., it skips over the start codon. This was the default
35 behaviour of the previous version of the program.
36
37 -l n makes the output omit an sequences shorter than n characters.
38 n includes the 3 skipped characters if the -skip switch
39 is one.
40
41 To compile the program:
42
43 g++ extract.c -lm -o extract
44
45 Uses include file delcher.h
46
47
48 To run the program:
49
50 extract genome.seq list.coord <options>
51
52 where genome.seq is a genome sequence in FASTA format and
53 list.coord is a list of start/stop pairs
54
0 // Copyright (c) 1997-99 by Arthur Delcher, Steven Salzberg, Simon
1 // Kasif, and Owen White. All rights reserved. Redistribution
2 // is not permitted without the express written permission of
3 // the authors.
4
5 // Version 1.02 revised 25 Feb 98 to ignore the independent
6 // (random) model for long orfs. The default
7 // length for "long" in this case is set to the length at which
8 // exactly 1 orf of this length would be expected per 1 million
9 // bases given the gc content of the genome. This value also can be
10 // set by command-line option -q .
11
12 // Version 1.03 revised 8 Feb 99 to make it easier to specify
13 // start and stop codons.
14
15 // Version 1.04 revised 10 May 99 to add -l command-line switch
16 // to both glimmer and long-orfs to regard genome as *NOT*
17 // circular. Default is to regard it as circular.
18 // Version 2.0 uses a tree-based IMM as described in the references
19 // given in the README file. It also implements an extensive new
20 // algorithm (see the paper) to adjust the start locations of genes
21 // whose initial coordinates result in an overlap.
22
23 // Version: 2.01 31 Jul 98
24 // Change probability model
25 // Simplify wraparounds
26 // Move start codons to eliminate overlaps
27 // Discount independent model scores when
28 // there are no overlaps
29 // Uses Harmon's model
30
31 // Version: 2.03 9 Dec 2002
32 // Include raw scores in output
33 // Add strict option to use independent intergenic
34 // model that discounts stop codons
35 // Add option to score each entry from a list of coordinates
36 // separately, without overlapping/voting rules
37
38 // Version: 2.10 5 Feb 2003
39 // Strict option to use independent intergenic
40 // model that discounts stop codons is only behaviour
41
42 // Version: 2.11 18 Apr 2003
43 // Change long-orfs to automatically compute the
44 // optimal value of ORF length in order to maximize
45 // the amount of training data.
46 Program glimmer takes two inputs: a sequence file (in FASTA format)
47 and a collection of Markov models for genes as produced by the program
48 build-icm . It outputs a list of all open reading frames (orfs) together
49 with scores for each as a gene.
50
51 The first few lines of output specify the settings of various
52 parameter in the program:
53
54 Minimum gene length is the length of the smallest fragment
55 considered to be a gene. The length is measured from the first base
56 of the start codon to the last base *before* the stop codon.
57 This value can be specified when running the program with the -g option.
58
59 Minimum overlap length is a lower bound on the number of bases overlap
60 between 2 genes that is considered a problem. Overlaps shorter than
61 this are ignored.
62
63 Minimum overlap percent is another lower bound on the number of bases
64 overlap that is considered a problem. Overlaps shorter than this
65 percentage of *both* genes are ignored.
66
67 Threshold score is the minimum in-frame score for a fragment to be
68 considered a potential gene.
69
70 Use independent scores indicates whether the last column that scores each
71 fragment using independent base probabilities is present.
72
73 Use first start codon indicates whether the first possible start codon
74 is used or not. If not, the function Choose_Start is called to
75 choose the start codon. Currently it computes hybridization energy
76 between the string Ribosome_Pattern and the region in front of
77 the start codon, and if this is above a threshold, that start site
78 is chosen. The ribosome pattern string can be set by the -s option.
79 Presumably function Choose_Start should be modified to do something
80 cleverer.
81
82 Currently used start codons are atg, gtg & ttg . These can be changed
83 in the function Is_Start , but corresponding changes should be
84 made in Choose_Start .
85
86
87 The next portion of the output is the result for each orf:
88
89 Column 1 is an ID number for reference purposes. It is assigned
90 sequentially starting with 1 to all orfs whose Gene Score is
91 at least 90 . I'll make this a command-line option when I decide
92 what letter to use.
93
94 Column 2 is the reading frame of the orf. Three forward (F1, F2 and F3)
95 and three reverse (R1, R2 and R3). These correspond with the headings
96 for the scores in columns 9-14.
97
98 Column 3 is the start position of the orf, i.e., the first base *after*
99 the previous stop codon.
100
101 Column 4 is the position of the first base of the first start codon in
102 the orf. Currently I use atg, ctg, gtg and ttg as start codons.
103
104 Column 5 is the position of the last base *before* the stop codon. Stop
105 codons are taa, tag, and tga. Note that for orfs in the reverse
106 reading frames have their start position higher than the end position.
107 The order in which orfs are listed is in increasing order by
108 Max {OrfStart, End}, i.e., the highest numbered position in the orf,
109 except for orfs that "wrap around" the end of the sequence.
110
111 Columns 6 and 7 are the lengths of the orf and gene, respectively, i.e.,
112 1 + |OrfStart - End| and 1 + |GeneStart - End| .
113
114 Column 8 is the score for the gene region. It is the probability (as
115 a percent) that the Markov model in the correct frame generated this
116 sequence. This value matches the value in the corresponding column
117 of frame scores--an orf in reading frame R1 has a Gene Score equal to
118 the value in the R1 column of frame scores for that orf.
119
120 Columns 9-14 are the scores for the gene region in each of the 6 reading
121 frames. It is the probability (as a percent) that the Markov model in
122 that frame generated this sequence.
123
124 Column 15 is the probability as a percent that the gene sequence was generated
125 by a model of independent probabilities for each base, and represents to
126 some extent the probability that the sequence is "random".
127
128
129 When two genes with ID numbers overlap by at least a sufficient
130 amount (as determined by Min_Olap and Min_Olap_Percent ), a line
131 beginning with *** is printed and scores for the overlap region
132 are printed. If the frame of the high score of the overlap
133 region matches the frame of the longer gene, then a message is
134 printed that the shorter gene is rejected. Otherwise, a message
135 is printed that *both* genes are "suspect". A suspect or reject
136 message for any gene is only printed once, however.
137
138 A message is also printed if a gene with an ID number wholly contains another
139 gene with an ID number. The longer "shadows" the shorter.
140
141
142 At the end a list of "putative" gene positions is produced. The first
143 column is the ID number, the second is the start position, the third
144 is the end position. For "suspect" genes, a notation in [] 's follows:
145
146 [Bad Olap a b c] means that gene number a overlapped this one and
147 was shorter but scored higher on the overlap region. b is the length
148 of the overlap region and c is the score of *this* gene on the overlap
149 region. There should be a [Shorter ...] notation with gene a
150 giving its score.
151
152 [Shorter a b c] means that gene number a overlapped this one and
153 was longer but scored lower on the overlap region. b is the length
154 of the overlap region and c is the score of *this* gene on the overlap
155 region. There should be a [Bad olap ...] notation with gene a
156 giving its score.
157
158 [Shadowed by a] means that this gene was completed contained as part
159 of gene a 's region, but in another frame.
160
161 [Delay by a b c d] means that this gene was tentatively rejected
162 because of an overlap with gene b , but if the start codon is postponed
163 by a positions, then this would be a valid gene. The start position
164 reported for this gene includes the delay. c is the length of the overlap
165 region that caused the rejection and d is the score in this gene's frame
166 on that overlap region.
167
168 [Weak] means that this gene did not meet the regular scoring threshold,
169 but if the independent model were ignored, its score would be high
170 enough. Should only occur if the -w option is used.
171
172 [Vote] means that this gene did not meet the regular scoring threshold,
173 but sufficiently many of its subranges had high enough scores to
174 indicate it might be a gene.
175
176 Note that a gene marked as rejected may appear in this list. This can
177 occur if the gene that caused the rejection was itself rejected. The
178 actual algorithm to produce the list is as follows:
179
180 Consider the genes in decreasing order by length. If gene x is to
181 be rejected because of an overlap with longer gene y that has not been
182 rejected, then gene x is rejected and does not appear in the list.
183 Otherwise, all notations for gene x that are not caused by rejected
184 genes are reported.
185
186 I think a "delayed" gene might incorrectly be listed as causing a problem
187 by the part of it that was eliminated by the delay. Probably the remaining
188 portion should be reinserted into the sorted list base on its now-shorter
189 length, and any notations caused by it should be re-checked to see if
190 they're affected by shortening the gene. Let's save this for the next
191 version.
192
193
194
195 Specifying Different Start and Stop Codons:
196
197 To specify different sets of start and stop codons, modify the file
198 gene.h . Specifically, the functions:
199
200 Is_Forward_Start Is_Reverse_Start Is_Start
201 Is_Forward_Stop Is_Reverse_Stop Is_Stop
202
203 are used to determine what is used for start and stop codons.
204
205 Is_Start and Is_Stop do simple string comparisons to specify
206 which patterns are used. To add a new pattern, just add the comparison
207 for it. To remove a pattern, comment out or delete the comparison
208 for it.
209
210 The other four functions use a bit comparison to determine start and
211 stop patterns. They represent a codon as a 12-bit pattern, with 4 bits
212 for each base, one bit for each possible value of the bases, T, G, C
213 or A. Thus the bit pattern 0010 0101 1100 represents the base
214 pattern [C] [A or G] [G or T]. By doing bit operations (& | ~) and
215 comparisons, more complicated patterns involving ambiguous reads
216 can be tested efficiently. Simple patterns can be tested as in
217 the current code.
218
219 For example, to insert an additional start codon of CAT requires 3 changes:
220 1. The line
221 || (Codon & 0x218) == Codon
222 should be inserted into Is_Forward_Start , since 0x218 = 0010 0001 1000
223 represents CAT.
224 2. The line
225 || (Codon & 0x184) == Codon
226 should be inserted into Is_Reverse_Start , since 0x184 = 0001 1000 0100
227 represents ATG, which is the reverse-complement of CAT. Alternately,
228 the #define constant ATG_MASK could be used.
229 3. The line
230 || strncmp (S, "cat", 3) == 0
231 should be inserted into Is_Start .
232 If not automatically using the first start codon, some changes might
233 also be made to the function Choose_Start .
234
235
236
237 To compile the program:
238
239 Use the Makefile. It will put the executables in a bin subdirectory.
240
241 To compile just this program use:
242
243 g++ glimmer2.c -lm -o glimmer
244
245 Uses include files delcher.h context.h strarray.h gene.h
246
247
248 To run the program:
249
250 First run build-icm on a set of sequences to make the Markov models.
251
252 build-icm <train.seq >train.model
253
254 This will produce a file train.model. You can call this file anything
255 you like, train.model, myicm, itsrainingtoday, etc.
256
257 Then run glimmer2
258
259 glimmer2 hflu.seq train.model
260
261 Options can be specified after the 2nd file name
262
263 glimmer2 hflu.seq train.model <options>
264
265 Options are:
266 -f Use ribosome-binding energy to choose start codon. This is
267 not fully tested and likely to be buggy. Better not to use it.
268 +f Use first codon in orf as start codon
269 -g n Set minimum gene length to n
270 -i s Ignore bases within the coordinates listed in file s. File s
271 should consist of one base pair per line (no tags), and the ignore
272 region should be a multiple of three bases long. [Somewhat buggy]
273 -l Regard the genome as linear (not circular), i.e., do not allow
274 genes to "wrap around" the end of the genome.
275 This option works on both glimmer and long-orfs .
276 The default behavior is to regard the genome as circular.
277 -o n Set minimum overlap length to n. Overlaps shorter than this
278 are ignored.
279 -p n Set minimum overlap percentage to n%. Overlaps shorter than
280 this percentage of *both* strings are ignored.
281 -q n If using independent model scores (+r option), it will only
282 apply to orfs shorter than n . The default value for n
283 has an expectation of one orf that length or longer occurring
284 per million bases in a random genome with the same gc content
285 -r Don't use independent probability score column
286 +r Use independent probability score column
287 -s s Use string s as the ribosome binding pattern to find start codons.
288 Not fully tested and known to have bugs.
289 -t n Set threshold score for calling as gene to n. If the in-frame
290 score >= n, then the region is given a number and considered
291 a potential gene.
292 -w n Use "weak" scores on potential genes at least n bases long.
293 Weak scores ignore the independent model.
294 -X Allow orfs extending off ends of sequence to be scored
0 // Copyright (c) 1997-99 by Arthur Delcher, Steven Salzberg, Simon
1 // Kasif, and Owen White. All rights reserved. Redistribution
2 // is not permitted without the express written permission of
3 // the authors.
4 // Version: 1.1 April 2003 (S. Salzberg)
5 // Compute the optimal length for minimum "long"
6 // orfs, so that the program will return the largest
7 // number of orfs possible. The -g switch still works
8 // if specified, but I don't know why anyone would want
9 // to use that for a training set.
10 // Also, change min overlap by default to be 0.
11 // Version 1.04 revised 10 May 99 to add -l command-line switch
12 // to both glimmer and long-orfs to regard genome as *NOT*
13 // circular. Default is to regard it as circular.
14
15 Program long-orfs takes a sequence file (in FASTA format) and
16 outputs a list of all long "potential genes" in it that do not
17 overlap by too much. By "potential gene" I mean the portion of
18 an orf from the first start codon to the stop codon at the end.
19
20 The first few lines of output specify the settings of various
21 parameters in the program:
22
23 Minimum gene length is the length of the smallest fragment
24 considered to be a gene. The length is measured from the first base
25 of the start codon to the last base *before* the stop codon.
26 This value can be specified when running the program with the -g option.
27 By default, the program now (April 2003) will compute an optimal length
28 for this parameter, where "optimal" is the value that produces the
29 greatest number of long ORFs, thereby increasing the amount of data
30 used for training.
31
32 Minimum overlap length is a lower bound on the number of bases overlap
33 between 2 genes that is considered a problem. Overlaps shorter than
34 this are ignored.
35
36 Minimum overlap percent is another lower bound on the number of bases
37 overlap that is considered a problem. Overlaps shorter than this
38 percentage of *both* genes are ignored.
39
40 The next portion of the output is a list of potential genes:
41
42 Column 1 is an ID number for reference purposes. It is assigned
43 sequentially starting with 1 to all long potential genes. If
44 overlapping genes are eliminated, gaps in the numbers will occur.
45 The ID prefix is specified in the constant ID_PREFIX .
46
47 Column 2 is the position of the first base of the first start codon in
48 the orf. Currently I use atg, and gtg as start codons. This is
49 easily changed in the function Is_Start () .
50
51 Column 3 is the position of the last base *before* the stop codon. Stop
52 codons are taa, tag, and tga. Note that for orfs in the reverse
53 reading frames have their start position higher than the end position.
54 The order in which orfs are listed is in increasing order by
55 Max {OrfStart, End}, i.e., the highest numbered position in the orf,
56 except for orfs that "wrap around" the end of the sequence.
57
58 When two genes with ID numbers overlap by at least a sufficient
59 amount (as determined by Min_Olap and Min_Olap_Percent ), they
60 are eliminated and do not appear in the output.
61
62 The final output of the program (sent to the standard error file so
63 it does not show up when output is redirected to a file) is the
64 length of the longest orf found.
65
66
67
68 Specifying Different Start and Stop Codons:
69
70 To specify different sets of start and stop codons, modify the file
71 gene.h . Specifically, the functions:
72
73 Is_Forward_Start Is_Reverse_Start Is_Start
74 Is_Forward_Stop Is_Reverse_Stop Is_Stop
75
76 are used to determine what is used for start and stop codons.
77
78 Is_Start and Is_Stop do simple string comparisons to specify
79 which patterns are used. To add a new pattern, just add the comparison
80 for it. To remove a pattern, comment out or delete the comparison
81 for it.
82
83 The other four functions use a bit comparison to determine start and
84 stop patterns. They represent a codon as a 12-bit pattern, with 4 bits
85 for each base, one bit for each possible value of the bases, T, G, C
86 or A. Thus the bit pattern 0010 0101 1100 represents the base
87 pattern [C] [A or G] [G or T]. By doing bit operations (& | ~) and
88 comparisons, more complicated patterns involving ambiguous reads
89 can be tested efficiently. Simple patterns can be tested as in
90 the current code.
91
92 For example, to insert an additional start codon of CAT requires 3 changes:
93 1. The line
94 || (Codon & 0x218) == Codon
95 should be inserted into Is_Forward_Start , since 0x218 = 0010 0001 1000
96 represents CAT.
97 2. The line
98 || (Codon & 0x184) == Codon
99 should be inserted into Is_Reverse_Start , since 0x184 = 0001 1000 0100
100 represents ATG, which is the reverse-complement of CAT. Alternately,
101 the #define constant ATG_MASK could be used.
102 3. The line
103 || strncmp (S, "cat", 3) == 0
104 should be inserted into Is_Start .
105
106
107
108 To compile the program:
109
110 g++ long-orfs.c -lm -o long-orfs
111
112 Uses include files delcher.h gene.h
113
114
115 To run the program:
116
117 long-orfs genome.seq
118
119 where genome.seq is a genome sequence in FASTA format.
120
121 Options can be specified after the genome file name
122
123 long-orfs genome.seq <options>
124
125 Options are:
126 -g n Set minimum gene length to n. Default is to compute an
127 optimal value automatically. Don't change this unless you
128 know what you're doing.
129 -l Regard the genome as linear (not circular), i.e., do not allow
130 genes to "wrap around" the end of the genome.
131 This option works on both glimmer and long-orfs .
132 The default behavior is to regard the genome as circular.
133 -o n Set maximum overlap length to n. Overlaps shorter than this
134 are permitted. (Default is 0 bp.)
135 -p n Set maximum overlap percentage to n%. Overlaps shorter than
136 this percentage of *both* strings are ignored. (Default is 10%.)
137
138 If you *DON'T* want to eliminate overlapping genes, just use the -p 100
139 option.
0 <!doctype refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
1
2 <!-- Process this file with docbook-to-man to generate an nroff manual
3 page: `docbook-to-man manpage.sgml > manpage.1'. You may view
4 the manual page with: `docbook-to-man manpage.sgml | nroff -man |
5 less'. A typical entry in a Makefile or Makefile.am is:
6
7 manpage.1: manpage.sgml
8 docbook-to-man $< > $@
9
10
11 The docbook-to-man binary is found in the docbook-to-man package.
12 Please remember that if you create the nroff version in one of the
13 debian/rules file targets (such as build), you will need to include
14 docbook-to-man in your Build-Depends control field.
15
16 -->
17
18 <!-- Fill in your name for FIRSTNAME and SURNAME. -->
19 <!ENTITY dhfirstname "<firstname>Steffen</firstname>">
20 <!ENTITY dhsurname "<surname>Möller</surname>">
21 <!-- Please adjust the date whenever revising the manpage. -->
22 <!ENTITY dhdate "<date>November 10, 2004</date>">
23 <!ENTITY dhsection "<manvolnum>1</manvolnum>">
24 <!ENTITY dhemail "<email>moeller@debian.org</email>">
25 <!ENTITY dhusername "Steffen Moeller">
26 <!ENTITY dhucpackage "<refentrytitle>TIGR-GLIMMER</refentrytitle>">
27 <!ENTITY dhpackage "tigr-glimmer">
28
29 <!ENTITY debian "<productname>Debian</productname>">
30 <!ENTITY gnu "<acronym>GNU</acronym>">
31 <!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
32 ]>
33
34 <refentry>
35 <refentryinfo>
36 <address>
37 &dhemail;
38 </address>
39 <author>
40 &dhfirstname;
41 &dhsurname;
42 </author>
43 <copyright>
44 <year>2003</year>
45 <holder>&dhusername;</holder>
46 </copyright>
47 &dhdate;
48 </refentryinfo>
49 <refmeta>
50 &dhucpackage;
51
52 &dhsection;
53 </refmeta>
54 <refnamediv>
55 <refname>&dhpackage;</refname>
56
57 <refpurpose>
58 The program lacks a description
59 </refpurpose>
60 </refnamediv>
61 <refsynopsisdiv>
62 <cmdsynopsis>
63 <command>tigr-anomaly</command>
64 <arg>>dna-file</arg>
65 <arg>>coord-file</arg>
66 </cmdsynopsis>
67 </refsynopsisdiv>
68 <refsect1>
69 <title>DESCRIPTION</title>
70 <para>
71 </para>
72
73 </refsect1>
74 <refsect1>
75 <title>OPTIONS</title>
76 </refsect1>
77 <refsect1>
78 <title>SEE ALSO</title>
79 <para>
80 tigr-glimmer3 (1),
81 tigr-adjust (1),
82 tigr-anomaly (1),
83 tigr-build-icm (1),
84 tigr-check (1),
85 tigr-codon-usage (1),
86 tigr-compare-lists (1),
87 tigr-extract (1),
88 tigr-generate (1),
89 tigr-get-len (1),
90 tigr-get-putative (1),
91 </para>
92 <para>
93 http://www.tigr.org/software/glimmer/
94 </para>
95
96 <para>Please see the readme in /usr/share/doc/tigr-glimmer for a description on how to use Glimmer3.</para>
97 </refsect1>
98 <refsect1>
99 <title>AUTHOR</title>
100
101 <para>This manual page was quickly copied from the glimmer web site by &dhusername; &dhemail; for
102 the &debian; system.
103 </para>
104
105 </refsect1>
106 </refentry>
107
108 <!-- Keep this comment at the end of the file
109 Local variables:
110 mode: sgml
111 sgml-omittag:t
112 sgml-shorttag:t
113 sgml-minimize-attributes:nil
114 sgml-always-quote-attributes:t
115 sgml-indent-step:2
116 sgml-indent-data:t
117 sgml-parent-document:nil
118 sgml-default-dtd-file:nil
119 sgml-exposed-tags:nil
120 sgml-local-catalogs:nil
121 sgml-local-ecat-files:nil
122 End:
123 -->
0 <!doctype refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
1
2 <!-- Process this file with docbook-to-man to generate an nroff manual
3 page: `docbook-to-man manpage.sgml > manpage.1'. You may view
4 the manual page with: `docbook-to-man manpage.sgml | nroff -man |
5 less'. A typical entry in a Makefile or Makefile.am is:
6
7 manpage.1: manpage.sgml
8 docbook-to-man $< > $@
9
10
11 The docbook-to-man binary is found in the docbook-to-man package.
12 Please remember that if you create the nroff version in one of the
13 debian/rules file targets (such as build), you will need to include
14 docbook-to-man in your Build-Depends control field.
15
16 -->
17
18 <!-- Fill in your name for FIRSTNAME and SURNAME. -->
19 <!ENTITY dhfirstname "<firstname>Steffen</firstname>">
20 <!ENTITY dhsurname "<surname>Möller</surname>">
21 <!-- Please adjust the date whenever revising the manpage. -->
22 <!ENTITY dhdate "<date>Novemver 10, 2004</date>">
23 <!ENTITY dhsection "<manvolnum>1</manvolnum>">
24 <!ENTITY dhemail "<email>moeller@debian.org</email>">
25 <!ENTITY dhusername "Steffen Moeller">
26 <!ENTITY dhucpackage "<refentrytitle>TIGR-GLIMMER<title>">
27 <!ENTITY dhpackage "tigr-glimmer">
28
29 <!ENTITY debian "<productname>Debian</productname>">
30 <!ENTITY gnu "<acronym>GNU</acronym>">
31 <!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
32 ]>
33
34 <refentry>
35 <refentryinfo>
36 <address>
37 &dhemail;
38 </address>
39 <author>
40 &dhfirstname;
41 &dhsurname;
42 </author>
43 <copyright>
44 <year>2003</year>
45 <holder>&dhusername;</holder>
46 </copyright>
47 &dhdate;
48 </refentryinfo>
49 <refmeta>
50 &dhucpackage;
51
52 &dhsection;
53 </refmeta>
54 <refnamediv>
55 <refname>&dhpackage;</refname>
56 <refpurpose>Ceates and outputs an interpolated Markov model(IMM)</refpurpose>
57 </refnamediv>
58 <refsynopsisdiv>
59 <cmdsynopsis>
60 <command>tigr-build-icm</command>
61 </cmdsynopsis>
62 </refsynopsisdiv>
63 <refsect1>
64 <title>DESCRIPTION</title>
65 <para>
66 Program build-icm.c creates and outputs an interpolated Markov
67 model (IMM) as described in the paper
68 A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg.
69 Improved Microbial Gene Identification with Glimmer.
70 Nucleic Acids Research, 1999, in press.
71 Please reference this paper if you use the system as part of any
72 published research.
73 </para><para>
74 Input comes from the file named on the command-line. Format should be
75 one string per line. Each line has an ID string followed by white space
76 followed by the sequence itself. The script run-glimmer3 generates
77 an input file in the correct format using the 'extract' program.
78 </para><para>
79 The IMM is constructed as follows: For a given context, say
80 acgtta, we want to estimate the probability distribution of the
81 next character. We shall do this as a linear combination of the
82 observed probability distributions for this context and all of
83 its suffixes, i.e., cgtta, gtta, tta, ta, a and empty. By
84 observed distributions I mean the counts of the number of
85 occurrences of these strings in the training set. The linear
86 combination is determined by a set of probabilities, lambda, one
87 for each context string. For context acgtta the linear combination
88 coefficients are:
89 </para><para>
90 lambda (acgtta)
91 (1 - lambda (acgtta)) x lambda (cgtta)
92 (1 - lambda (acgtta)) x (1 - lambda (cgtta)) x lambda (gtta)
93 (1 - lambda (acgtta)) x (1 - lambda (cgtta)) x (1 - lambda (gtta)) x lambda (tta)
94 (1 - lambda (acgtta)) x (1 - lambda (cgtta)) x (1 - lambda (gtta))
95 x (1 - lambda (tta)) x (1 - lambda (ta)) x (1 - lambda (a))
96 </para><para>
97 We compute the lambda values for each context as follows:
98 - If the number of observations in the training set is &gt;= the constant
99 SAMPLE_SIZE_BOUND, the lambda for that context is 1.0
100 - Otherwise, do a chi-square test on the observations for this context
101 compared to the distribution predicted for the one-character shorter
102 suffix context.
103 If the chi-square significance &lt; 0.5, set the lambda for this context to 0.0
104 Otherwise set the lambda for this context to:
105 (chi-square significance) x (# observations) / SAMPLE_WEIGHT
106 </para><para>
107 To run the program:
108 </para><para>
109 build-icm &lt;train.seq &gt; train.model
110 </para><para>
111 This will use the training data in train.seq to produce the file
112 train.model, containing your IMM.
113 </para>
114 </refsect1>
115 <refsect1>
116 <title>SEE ALSO</title>
117 <para>
118 tigr-glimmer3 (1),
119 tigr-long-orfs (1),
120 tigr-adjust (1),
121 tigr-anomaly (1),
122 tigr-extract (1),
123 tigr-check (1),
124 tigr-codon-usage (1),
125 tigr-compare-lists (1),
126 tigr-extract (1),
127 tigr-generate (1),
128 tigr-get-len (1),
129 tigr-get-putative (1),
130 </para>
131 <para>http://www.tigr.org/software/glimmer/</para>
132 <para>Please see the readme in /usr/share/doc/tigr-glimmer for a description on how to use Glimmer3.</para>
133 </refsect1>
134 <refsect1>
135 <title>AUTHOR</title>
136
137 <para>This manual page was quickly copied from the glimmer web site and readme file by &dhusername; &dhemail; for
138 the &debian; system.
139 </para>
140
141 </refsect1>
142 </refentry>
143
144 <!-- Keep this comment at the end of the file
145 Local variables:
146 mode: sgml
147 sgml-omittag:t
148 sgml-shorttag:t
149 sgml-minimize-attributes:nil
150 sgml-always-quote-attributes:t
151 sgml-indent-step:2
152 sgml-indent-data:t
153 sgml-parent-document:nil
154 sgml-default-dtd-file:nil
155 sgml-exposed-tags:nil
156 sgml-local-catalogs:nil
157 sgml-local-ecat-files:nil
158 End:
159 -->
160
161
0 <!doctype refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
1
2 <!-- Process this file with docbook-to-man to generate an nroff manual
3 page: `docbook-to-man manpage.sgml > manpage.1'. You may view
4 the manual page with: `docbook-to-man manpage.sgml | nroff -man |
5 less'. A typical entry in a Makefile or Makefile.am is:
6
7 manpage.1: manpage.sgml
8 docbook-to-man $< > $@
9
10
11 The docbook-to-man binary is found in the docbook-to-man package.
12 Please remember that if you create the nroff version in one of the
13 debian/rules file targets (such as build), you will need to include
14 docbook-to-man in your Build-Depends control field.
15
16 -->
17
18 <!-- Fill in your name for FIRSTNAME and SURNAME. -->
19 <!ENTITY dhfirstname "<firstname>Steffen</firstname>">
20 <!ENTITY dhsurname "<surname>Möller</surname>">
21 <!-- Please adjust the date whenever revising the manpage. -->
22 <!ENTITY dhdate "<date>November 10, 2004</date>">
23 <!ENTITY dhsection "<manvolnum>1</manvolnum>">
24 <!ENTITY dhemail "<email>moeller@debian.org</email>">
25 <!ENTITY dhusername "Steffen Moeller">
26 <!ENTITY dhucpackage "<refentrytitle>TIGR-GLIMMER</refentrytitle>">
27 <!ENTITY dhpackage "tigr-glimmer">
28
29 <!ENTITY debian "<productname>Debian</productname>">
30 <!ENTITY gnu "<acronym>GNU</acronym>">
31 <!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
32 ]>
33
34 <refentry>
35 <refentryinfo>
36 <address>
37 &dhemail;
38 </address>
39 <author>
40 &dhfirstname;
41 &dhsurname;
42 </author>
43 <copyright>
44 <year>2003</year>
45 <holder>&dhusername;</holder>
46 </copyright>
47 &dhdate;
48 </refentryinfo>
49 <refmeta>
50 &dhucpackage;
51
52 &dhsection;
53 </refmeta>
54 <refnamediv>
55 <refname>&dhpackage;</refname>
56
57 <refpurpose>
58 Fine start/stop positions of genes in genome sequence
59 </refpurpose>
60 </refnamediv>
61 <refsynopsisdiv>
62 <cmdsynopsis>
63 <command>tigr-extract</command>
64 <arg>genome-file <option><replaceable>options</replaceable></option></arg>
65 </cmdsynopsis>
66 </refsynopsisdiv>
67 <refsect1>
68 <title>DESCRIPTION</title>
69 <para>
70 Program extract takes a FASTA format sequence file and a file
71 with a list of start/stop positions in that file (e.g., as produced
72 by the long-orfs program) and extracts and outputs the
73 specified sequences.
74 </para><para>
75 The first command-line argument is the name of the sequence file,
76 which must be in FASTA format.
77 </para><para>
78 The second command-line argument is the name of the coordinate file.
79 It must contain a list of pairs of positions in the first file, one
80 per line. The format of each entry is:
81 </para><para> &lt;IDstring>&gt; &lt;start position> &lt;stop position&gt;
82 </para><para>This file should contain no other information, so if you're using
83 the output of glimmer or long-orfs , you'll have to cut off
84 header lines.
85 </para><para>
86 The output of the program goes to the standard output and has one
87 line for each line in the coordinate file. Each line contains
88 the IDstring , followed by white space, followed by the substring
89 of the sequence file specified by the coordinate pair. Specifically,
90 the substring starts at the first position of the pair and ends at
91 the second position (inclusive). If the first position is bigger
92 than the second, then the DNA reverse complement of each position
93 is generated. Start/stop pairs that "wrap around" the end of the
94 genome are allowed.
95 </para>
96 </refsect1>
97 <refsect1>
98 <title>OPTIONS</title>
99 <variablelist>
100 <varlistentry>
101 <term><option>-skip</option></term>
102 <listitem>
103 <para> makes the output omit the first 3 characters of each sequence, i.e., it skips over the start codon. This was the behaviour of the previous version of the program.</para>
104 </listitem>
105 </varlistentry>
106 <varlistentry>
107 <term><option>-l</option></term><listitem><para>
108 makes the output omit an sequences shorter than n characters.
109 n includes the 3 skipped characters if the -skip switch
110 is one.
111 </para></listitem>
112 </varlistentry>
113 </variablelist>
114 </refsect1>
115 <refsect1>
116 <title>SEE ALSO</title>
117 <para>
118 tigr-glimmer3 (1),
119 tigr-long-orfs (1),
120 tigr-adjust (1),
121 tigr-anomaly (1),
122 tigr-build-icm (1),
123 tigr-check (1),
124 tigr-codon-usage (1),
125 tigr-compare-lists (1),
126 tigr-extract (1),
127 tigr-generate (1),
128 tigr-get-len (1),
129 tigr-get-putative (1),
130 </para>
131 <para>
132 http://www.tigr.org/software/glimmer/
133 </para>
134
135 <para>Please see the readme in /usr/share/doc/tigr-glimmer for a description on how to use Glimmer3.</para>
136 </refsect1>
137 <refsect1>
138 <title>AUTHOR</title>
139
140 <para>This manual page was quickly copied from the glimmer web site by &dhusername; &dhemail; for
141 the &debian; system.
142 </para>
143
144 </refsect1>
145 </refentry>
146
147 <!-- Keep this comment at the end of the file
148 Local variables:
149 mode: sgml
150 sgml-omittag:t
151 sgml-shorttag:t
152 sgml-minimize-attributes:nil
153 sgml-always-quote-attributes:t
154 sgml-indent-step:2
155 sgml-indent-data:t
156 sgml-parent-document:nil
157 sgml-default-dtd-file:nil
158 sgml-exposed-tags:nil
159 sgml-local-catalogs:nil
160 sgml-local-ecat-files:nil
161 End:
162 -->
163
164
0 <!doctype refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
1
2 <!-- Process this file with docbook-to-man to generate an nroff manual
3 page: `docbook-to-man manpage.sgml > manpage.1'. You may view
4 the manual page with: `docbook-to-man manpage.sgml | nroff -man |
5 less'. A typical entry in a Makefile or Makefile.am is:
6
7 manpage.1: manpage.sgml
8 docbook-to-man $< > $@
9
10
11 The docbook-to-man binary is found in the docbook-to-man package.
12 Please remember that if you create the nroff version in one of the
13 debian/rules file targets (such as build), you will need to include
14 docbook-to-man in your Build-Depends control field.
15
16 -->
17
18 <!-- Fill in your name for FIRSTNAME and SURNAME. -->
19 <!ENTITY dhfirstname "<firstname>Steffen</firstname>">
20 <!ENTITY dhsurname "<surname>Möller</surname>">
21 <!-- Please adjust the date whenever revising the manpage. -->
22 <!ENTITY dhdate "<date>November 10, 2004</date>">
23 <!ENTITY dhsection "<manvolnum>1</manvolnum>">
24 <!ENTITY dhemail "<email>moeller@debian.org</email>">
25 <!ENTITY dhusername "Steffen Moeller">
26 <!ENTITY dhucpackage "<refentrytitle>TIGR-GLIMMER</refentrytitle>">
27 <!ENTITY dhpackage "tigr-glimmer">
28
29 <!ENTITY debian "<productname>Debian</productname>">
30 <!ENTITY gnu "<acronym>GNU</acronym>">
31 <!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
32 ]>
33
34 <refentry>
35 <refentryinfo>
36 <address>
37 &dhemail;
38 </address>
39 <author>
40 &dhfirstname;
41 &dhsurname;
42 </author>
43 <copyright>
44 <year>2003</year>
45 <holder>&dhusername;</holder>
46 </copyright>
47 &dhdate;
48 </refentryinfo>
49 <refmeta>
50 &dhucpackage;
51
52 &dhsection;
53 </refmeta>
54 <refnamediv>
55 <refname>&dhpackage;</refname>
56 <refpurpose>
57 Find/Score potential genes in genome-file using the probability model in icm-file
58 </refpurpose>
59 </refnamediv>
60 <refsynopsisdiv>
61 <cmdsynopsis>
62 <command>tigr-glimmer3</command>
63 <arg><option><replaceable>genome-file</replaceable></option></arg>
64 <arg><option><replaceable>icm-file</replaceable></option></arg>
65 <arg><option><replaceable>[options]</replaceable></option></arg>
66 </cmdsynopsis>
67 </refsynopsisdiv>
68 <refsect1>
69 <title>DESCRIPTION</title>
70 <para>
71 <command>&dhpackage;</command> is a system for finding genes in microbial DNA, especially the genomes of bacteria and archaea. <command>&dhpackage;</command> (Gene Locator and Interpolated Markov Modeler) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA. The IMM approach, described in our Nucleic Acids Research paper on <command>&dhpackage;</command> 1.0 and in our subsequent paper on <command>&dhpackage;</command> 2.0, uses a combination of Markov models from 1st through 8th-order, weighting each model according to its predictive power. <command>&dhpackage;</command> 1.0 and 2.0 use 3-periodic nonhomogenous Markov models in their IMMs.
72 </para><para>
73 <command>&dhpackage;</command> is the primary microbial gene finder at TIGR, and has been used to annotate the complete genomes of B. burgdorferi (Fraser et al., Nature, Dec. 1997), T. pallidum (Fraser et al., Science, July 1998), T. maritima, D. radiodurans, M. tuberculosis, and non-TIGR projects including C. trachomatis, C. pneumoniae, and others. Its analyses of some of these genomes and others is available at the TIGR microbial database site.
74 </para><para>
75 A special version of <command>&dhpackage;</command> designed for small eukaryotes, GlimmerM, was used to find the genes in chromosome 2 of the malaria parasite, P. falciparum.. GlimmerM is described in S.L. Salzberg, M. Pertea, A.L. Delcher, M.J. Gardner, and H. Tettelin, "Interpolated Markov models for eukaryotic gene finding," Genomics 59 (1999), 24-31. Click here (http://www.tigr.org/software/glimmerm/) to visit the GlimmerM site, which includes information on how to download the GlimmerM system.
76 </para><para>
77 The <command>&dhpackage;</command> system consists of two main programs. The first of these is the training program, build-imm. This program takes an input set of sequences and builds and outputs the IMM for them. These sequences can be complete genes or just partial orfs. For a new genome, this training data can consist of those genes with strong database hits as well as very long open reading frames that are statistically almost certain to be genes. The second program is glimmer, which uses this IMM to identify putative genes in an entire genome. <command>&dhpackage;</command> automatically resolves conflicts between most overlapping genes by choosing one of them. It also identifies genes that are suspected to truly overlap, and flags these for closer inspection by the user. These ``suspect'' gene candidates have been a very small percentage of the total for all the genomes analyzed thus far.
78 <command>&dhpackage;</command> is a program that...</para>
79 </refsect1>
80 <refsect1>
81 <title>OPTIONS</title>
82 <variablelist>
83 <varlistentry>
84 <term><option>-C <replaceable>n</replaceable></option></term>
85 <listitem>
86 <para>Use n as GC percentage of independent model</para>
87 <para>Note: n should be a percentage, e.g., -C 45.2</para>
88 </listitem>
89 </varlistentry>
90 <varlistentry>
91 <term>-f</term><listitem><para>Use ribosome-binding energy to choose start codon</para></listitem>
92 </varlistentry>
93 <varlistentry>
94 <term><option>+f</option></term><listitem><para>Use first codon in orf as start codon</para></listitem>
95 </varlistentry>
96 <varlistentry>
97 <term><option>-g <replaceable>n</replaceable></option></term><listitem><para>Set minimum gene length to n</para></listitem>
98 </varlistentry>
99 <varlistentry>
100 <term><option>-i <replaceable>filename</replaceable></option></term>
101 <listitem>
102 <para>Use <option><replaceable>filename</replaceable></option>
103 to select regions of bases that are off
104 limits, so that no bases within that area will be examined
105 </para>
106 </listitem>
107 </varlistentry>
108 <varlistentry>
109 <term><option>-l</option></term>
110 <listitem><para>Assume linear rather than circular genome, i.e., no wraparound</para></listitem>
111 </varlistentry>
112 <varlistentry>
113 <term><option>-L <replaceable>filename</replaceable></option></term>
114 <listitem><para>Use filename to specify a list of orfs that should
115 be scored separately, with no overlap rules
116 </para></listitem>
117 </varlistentry>
118 <varlistentry>
119 <term><option>-M</option></term>
120 <listitem><para>Input is a multifasta file of separate genes to be scored
121 separately, with no overlap rules
122 </para>
123 </listitem>
124 </varlistentry>
125 <varlistentry>
126 <term><option>-o <replaceable>n</replaceable></option></term>
127 <listitem>
128 <para>Set minimum overlap length to n. Overlaps shorter than this
129 are ignored.
130 </para></listitem>
131 </varlistentry>
132 <varlistentry>
133 <term><option>-p <replaceable>n</replaceable></option></term>
134 <listitem>
135 <para>
136 Set minimum overlap percentage to n%. Overlaps shorter than this percentage of *both* strings are ignored.
137 </para>
138 </listitem>
139 </varlistentry>
140 <varlistentry>
141 <term><option>-q <replaceable>n</replaceable></option></term>
142 <listitem>
143 <para>Set the maximum length orf that can be rejected because of
144 the independent probability score column to (n - 1)
145 </para>
146 </listitem>
147 </varlistentry>
148 <varlistentry>
149 <term><option>-r</option></term>
150 <listitem>
151 <para>
152 Don't use independent probability score column
153 </para>
154 </listitem>
155 </varlistentry>
156 <varlistentry>
157 <term><option>+r</option></term>
158 <listitem><para>
159 Use independent probability score column
160 </para>
161 </listitem>
162 </varlistentry>
163 <varlistentry>
164 <term><option>-r</option></term>
165 <listitem>
166 <para>
167 Don't use independent probability score column
168 </para> </listitem> </varlistentry> <varlistentry>
169 <term><option>-s <replaceable>s</replaceable></option></term>
170 <listitem><para> Use string s as the ribosome binding pattern to find start codons.</para>
171 </listitem>
172 </varlistentry>
173 <varlistentry>
174 <term><option>+S</option></term>
175 <listitem>
176 <para>
177 Do use stricter independent intergenic model that doesn't
178 give probabilities to in-frame stop codons. (Option is obsolete
179 since this is now the only behaviour
180 </para> </listitem>
181 </varlistentry>
182 <varlistentry>
183 <term><option>-t <replaceable>n</replaceable></option></term>
184 <listitem><para>
185 Set threshold score for calling as gene to n. If the in-frame
186 score >= n, then the region is given a number and considered
187 a potential gene.
188 </para> </listitem>
189 </varlistentry>
190 <varlistentry>
191 <term><option>-w <replaceable>n</replaceable> </option></term>
192 <listitem><para>
193 Use "weak" scores on tentative genes n or longer. Weak
194 scores ignore the independent probability score.
195 </para></listitem>
196 </varlistentry>
197 </variablelist>
198 </refsect1>
199 <refsect1>
200 <title>SEE ALSO</title>
201 <para>
202 tigr-adjust (1),
203 tigr-anomaly (1),
204 tigr-build-icm (1),
205 tigr-check (1),
206 tigr-codon-usage (1),
207 tigr-compare-lists (1),
208 tigr-extract (1),
209 tigr-generate (1),
210 tigr-get-len (1),
211 tigr-get-putative (1),
212 tigr-glimmer3 (1),
213 tigr-long-orfs (1)
214 </para>
215 <para>
216 http://www.tigr.org/software/glimmer/
217 </para>
218 <para>Please see the readme in /usr/share/doc/glimmer for a description on how to use Glimmer.</para>
219 </refsect1>
220 <refsect1>
221 <title>AUTHOR</title>
222 <para>This manual page was quickly copied from the glimmer web site by &dhusername; &dhemail; for
223 the &debian; system.
224 </para>
225 </refsect1>
226 </refentry>
227
228 <!-- Keep this comment at the end of the file
229 Local variables:
230 mode: sgml
231 sgml-omittag:t
232 sgml-shorttag:t
233 sgml-minimize-attributes:nil
234 sgml-always-quote-attributes:t
235 sgml-indent-step:2
236 sgml-indent-data:t
237 sgml-parent-document:nil
238 sgml-default-dtd-file:nil
239 sgml-exposed-tags:nil
240 sgml-local-catalogs:nil
241 sgml-local-ecat-files:nil
242 End:
243 -->
244
245
0 <!doctype refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
1
2 <!-- Process this file with docbook-to-man to generate an nroff manual
3 page: `docbook-to-man manpage.sgml > manpage.1'. You may view
4 the manual page with: `docbook-to-man manpage.sgml | nroff -man |
5 less'. A typical entry in a Makefile or Makefile.am is:
6
7 manpage.1: manpage.sgml
8 docbook-to-man $< > $@
9
10
11 The docbook-to-man binary is found in the docbook-to-man package.
12 Please remember that if you create the nroff version in one of the
13 debian/rules file targets (such as build), you will need to include
14 docbook-to-man in your Build-Depends control field.
15
16 -->
17
18 <!-- Fill in your name for FIRSTNAME and SURNAME. -->
19 <!ENTITY dhfirstname "<firstname>Steffen</firstname>">
20 <!ENTITY dhsurname "<surname>Möller</surname>">
21 <!-- Please adjust the date whenever revising the manpage. -->
22 <!ENTITY dhdate "<date>November 10, 2004</date>">
23 <!ENTITY dhsection "<manvolnum>1</manvolnum>">
24 <!ENTITY dhemail "<email>moeller@debian.org</email>">
25 <!ENTITY dhusername "Steffen Moeller">
26 <!ENTITY dhucpackage "<refentrytitle>LONG-ORFS</refentrytitle>">
27 <!ENTITY dhpackage "long-orfs">
28
29 <!ENTITY debian "<productname>Debian</productname>">
30 <!ENTITY gnu "<acronym>GNU</acronym>">
31 <!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
32 ]>
33
34 <refentry>
35 <refentryinfo>
36 <address>
37 &dhemail;
38 </address>
39 <author>
40 &dhfirstname;
41 &dhsurname;
42 </author>
43 <copyright>
44 <year>2003</year>
45 <holder>&dhusername;</holder>
46 </copyright>
47 &dhdate;
48 </refentryinfo>
49 <refmeta>
50 &dhucpackage;
51
52 &dhsection;
53 </refmeta>
54 <refnamediv>
55 <refname>&dhpackage;</refname>
56
57 <refpurpose>
58 Find/Score potential genes in genome-file using
59 the probability model in icm-file
60 </refpurpose>
61 </refnamediv>
62 <refsynopsisdiv>
63 <cmdsynopsis>
64 <command>tigr-long-orgs</command>
65 <arg>genome-file <option><replaceable>options</replaceable></option></arg>
66 </cmdsynopsis>
67 </refsynopsisdiv>
68 <refsect1>
69 <title>DESCRIPTION</title>
70 <para>
71 Program long-orfs takes a sequence file (in FASTA format) and
72 outputs a list of all long "potential genes" in it that do not
73 overlap by too much. By "potential gene" I mean the portion of
74 an orf from the first start codon to the stop codon at the end.
75 </para><para>
76 The first few lines of output specify the settings of various
77 parameters in the program:
78 </para><para>
79 Minimum gene length is the length of the smallest fragment
80 considered to be a gene. The length is measured from the first base
81 of the start codon to the last base *before* the stop codon.
82 This value can be specified when running the program with the -g option.
83 By default, the program now (April 2003) will compute an optimal length
84 for this parameter, where "optimal" is the value that produces the
85 greatest number of long ORFs, thereby increasing the amount of data
86 used for training.
87 </para><para>
88 Minimum overlap length is a lower bound on the number of bases overlap
89 between 2 genes that is considered a problem. Overlaps shorter than
90 this are ignored.
91 </para><para>
92 Minimum overlap percent is another lower bound on the number of bases
93 overlap that is considered a problem. Overlaps shorter than this
94 percentage of *both* genes are ignored.
95 </para><para>
96 The next portion of the output is a list of potential genes:
97 </para><para>
98 Column 1 is an ID number for reference purposes. It is assigned
99 sequentially starting with 1 to all long potential genes. If
100 overlapping genes are eliminated, gaps in the numbers will occur.
101 The ID prefix is specified in the constant ID_PREFIX .
102 </para><para>
103 Column 2 is the position of the first base of the first start codon in
104 the orf. Currently I use atg, and gtg as start codons. This is
105 easily changed in the function Is_Start () .
106 </para><para>
107 Column 3 is the position of the last base *before* the stop codon. Stop
108 codons are taa, tag, and tga. Note that for orfs in the reverse
109 reading frames have their start position higher than the end position.
110 The order in which orfs are listed is in increasing order by
111 Max {OrfStart, End}, i.e., the highest numbered position in the orf,
112 except for orfs that "wrap around" the end of the sequence.
113 </para><para>
114 When two genes with ID numbers overlap by at least a sufficient
115 amount (as determined by Min_Olap and Min_Olap_Percent ), they
116 are eliminated and do not appear in the output.
117 </para><para>
118 The final output of the program (sent to the standard error file so
119 it does not show up when output is redirected to a file) is the
120 length of the longest orf found.
121 </para><para>
122
123
124 Specifying Different Start and Stop Codons:
125 </para><para>
126 To specify different sets of start and stop codons, modify the file
127 gene.h . Specifically, the functions:
128 </para><para>
129 Is_Forward_Start Is_Reverse_Start Is_Start
130 Is_Forward_Stop Is_Reverse_Stop Is_Stop
131 </para><para>
132 are used to determine what is used for start and stop codons.
133 </para><para>
134 Is_Start and Is_Stop do simple string comparisons to specify
135 which patterns are used. To add a new pattern, just add the comparison
136 for it. To remove a pattern, comment out or delete the comparison
137 for it.
138 </para><para>
139 The other four functions use a bit comparison to determine start and
140 stop patterns. They represent a codon as a 12-bit pattern, with 4 bits
141 for each base, one bit for each possible value of the bases, T, G, C
142 or A. Thus the bit pattern 0010 0101 1100 represents the base
143 pattern [C] [A or G] [G or T]. By doing bit operations (& | ~) and
144 comparisons, more complicated patterns involving ambiguous reads
145 can be tested efficiently. Simple patterns can be tested as in
146 the current code.
147 </para><para>
148 For example, to insert an additional start codon of CAT requires 3 changes:
149 1. The line
150 || (Codon & 0x218) == Codon
151 should be inserted into Is_Forward_Start , since 0x218 = 0010 0001 1000
152 represents CAT.
153 2. The line
154 || (Codon & 0x184) == Codon
155 should be inserted into Is_Reverse_Start , since 0x184 = 0001 1000 0100
156 represents ATG, which is the reverse-complement of CAT. Alternately,
157 the #define constant ATG_MASK could be used.
158 3. The line
159 || strncmp (S, "cat", 3) == 0
160 should be inserted into Is_Start .
161 </para>
162
163 </refsect1>
164 <refsect1>
165 <title>OPTIONS</title>
166 <variablelist>
167 <varlistentry>
168 <term><option>-g <replaceable>n</replaceable></option></term>
169 <listitem>
170 <para> Set minimum gene length to n. Default is to compute an
171 optimal value automatically. Don't change this unless you
172 know what you're doing.</para>
173 </listitem>
174 </varlistentry>
175 <varlistentry>
176 <term><option>-l</option></term><listitem><para>Regard the genome as linear (not circular), i.e., do not allow
177 genes to "wrap around" the end of the genome.
178 This option works on both glimmer and long-orfs .
179 The default behavior is to regard the genome as circular.</para></listitem>
180 </varlistentry>
181 <varlistentry>
182 <term><option>-o <replaceable>n</replaceable></option></term><listitem><para>Set maximum overlap length to n. Overlaps shorter than this
183 are permitted. (Default is 0 bp.)</para></listitem>
184 </varlistentry>
185 <varlistentry>
186 <term><option>-p <replaceable>n</replaceable></option></term><listitem><para>Set maximum overlap percentage to n%. Overlaps shorter than
187 this percentage of *both* strings are ignored. (Default is 10%.)</para></listitem>
188 </varlistentry>
189 </variablelist>
190 </refsect1>
191 <refsect1>
192 <title>SEE ALSO</title>
193 <para>
194 tigr-glimmer3 (1),
195 tigr-adjust (1),
196 tigr-anomaly (1),
197 tigr-build-icm (1),
198 tigr-check (1),
199 tigr-codon-usage (1),
200 tigr-compare-lists (1),
201 tigr-extract (1),
202 tigr-generate (1),
203 tigr-get-len (1),
204 tigr-get-putative (1),
205 </para>
206 <para>
207 http://www.tigr.org/software/glimmer/
208 </para>
209
210 <para>Please see the readme in /usr/share/doc/tigr-glimmer for a description on how to use Glimmer3.</para>
211 </refsect1>
212 <refsect1>
213 <title>AUTHOR</title>
214
215 <para>This manual page was quickly copied from the glimmer web site by &dhusername; &dhemail; for
216 the &debian; system.
217 </para>
218
219 </refsect1>
220 </refentry>
221
222 <!-- Keep this comment at the end of the file
223 Local variables:
224 mode: sgml
225 sgml-omittag:t
226 sgml-shorttag:t
227 sgml-minimize-attributes:nil
228 sgml-always-quote-attributes:t
229 sgml-indent-step:2
230 sgml-indent-data:t
231 sgml-parent-document:nil
232 sgml-default-dtd-file:nil
233 sgml-exposed-tags:nil
234 sgml-local-catalogs:nil
235 sgml-local-ecat-files:nil
236 End:
237 -->
0 <!doctype refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
1
2 <!-- Process this file with docbook-to-man to generate an nroff manual
3 page: `docbook-to-man manpage.sgml > manpage.1'. You may view
4 the manual page with: `docbook-to-man manpage.sgml | nroff -man |
5 less'. A typical entry in a Makefile or Makefile.am is:
6
7 manpage.1: manpage.sgml
8 docbook-to-man $< > $@
9
10
11 The docbook-to-man binary is found in the docbook-to-man package.
12 Please remember that if you create the nroff version in one of the
13 debian/rules file targets (such as build), you will need to include
14 docbook-to-man in your Build-Depends control field.
15
16 -->
17
18 <!-- Fill in your name for FIRSTNAME and SURNAME. -->
19 <!ENTITY dhfirstname "<firstname>Steffen</firstname>">
20 <!ENTITY dhsurname "<surname>Möller</surname>">
21 <!-- Please adjust the date whenever revising the manpage. -->
22 <!ENTITY dhdate "<date>November 10, 2004</date>">
23 <!ENTITY dhsection "<manvolnum>1</manvolnum>">
24 <!ENTITY dhemail "<email>moeller@debian.org</email>">
25 <!ENTITY dhusername "Steffen Moeller">
26 <!ENTITY dhucpackage "<refentrytitle>TIGR-GLIMMER</refentrytitle>">
27 <!ENTITY dhpackage "tigr-glimmer">
28
29 <!ENTITY debian "<productname>Debian</productname>">
30 <!ENTITY gnu "<acronym>GNU</acronym>">
31 <!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
32 ]>
33
34 <refentry>
35 <refentryinfo>
36 <address>
37 &dhemail;
38 </address>
39 <author>
40 &dhfirstname;
41 &dhsurname;
42 </author>
43 <copyright>
44 <year>2003</year>
45 <holder>&dhusername;</holder>
46 </copyright>
47 &dhdate;
48 </refentryinfo>
49 <refmeta>
50 &dhucpackage;
51
52 &dhsection;
53 </refmeta>
54 <refnamediv>
55 <refname>&dhpackage;</refname>
56
57 <refpurpose>
58 Apply the suite of programs within glimmer3 to a a prokaryotic or archean genome.
59 </refpurpose>
60 </refnamediv>
61 <refsynopsisdiv>
62 <cmdsynopsis>
63 <command>tigr-run-glimmer3</command>
64 </cmdsynopsis>
65 </refsynopsisdiv>
66 <refsect1>
67 <title>DESCRIPTION</title>
68 <para>
69 A shell script that wraps a set of tigr-* utilities of the glimmer package to retrieve coding regions.
70 </para>
71 </refsect1>
72 <refsect1>
73 <title>SEE ALSO</title>
74 <para>
75 tigr-glimmer3 (1),
76 tigr-adjust (1),
77 tigr-anomaly (1),
78 tigr-build-icm (1),
79 tigr-check (1),
80 tigr-codon-usage (1),
81 tigr-compare-lists (1),
82 tigr-extract (1),
83 tigr-generate (1),
84 tigr-get-len (1),
85 tigr-get-putative (1),
86 tigr-long-orfs (1),
87 </para>
88 <para>
89 http://www.tigr.org/software/glimmer/
90 </para>
91
92 <para>Please see the readme in /usr/share/doc/tigr-glimmer for a description on how to use Glimmer3.</para>
93 </refsect1>
94 <refsect1>
95 <title>AUTHOR</title>
96
97 <para>This manual page was quickly copied from the glimmer web site by &dhusername; &dhemail; for
98 the &debian; system.
99 </para>
100
101 </refsect1>
102 </refentry>
103
104 <!-- Keep this comment at the end of the file
105 Local variables:
106 mode: sgml
107 sgml-omittag:t
108 sgml-shorttag:t
109 sgml-minimize-attributes:nil
110 sgml-always-quote-attributes:t
111 sgml-indent-step:2
112 sgml-indent-data:t
113 sgml-parent-document:nil
114 sgml-default-dtd-file:nil
115 sgml-exposed-tags:nil
116 sgml-local-catalogs:nil
117 sgml-local-ecat-files:nil
118 End:
119 -->
0 bin/* usr/bin
0 bin/* usr/lib
11 debian/tigr-run-glimmer3 usr/bin
0 debian/tigr-*.1
0 debian/*.1
1 debian/glimmer2_mans/*.1
55 include /usr/share/cdbs/1/rules/debhelper.mk
66 include /usr/share/cdbs/1/class/makefile.mk
77
8 MANPAGES=debian/tigr-anomaly.1 \
9 debian/tigr-build-icm.1 \
10 debian/tigr-extract.1 \
11 debian/tigr-glimmer3.1 \
12 debian/tigr-long-orfs.1 \
13 debian/tigr-run-glimmer3.1
8 MANPAGES=debian/glimmer2_mans/tigr-anomaly.1 \
9 debian/glimmer2_mans/tigr-build-icm.1 \
10 debian/glimmer2_mans/tigr-extract.1 \
11 debian/glimmer2_mans/tigr-glimmer3.1 \
12 debian/glimmer2_mans/tigr-long-orfs.1 \
13 debian/glimmer2_mans/tigr-run-glimmer3.1
1414
1515 .SUFFIXES: .1 .sgml
1616
2323 rm -f bin/* lib/* obj/*
2424
2525 build/tigr-glimmer:: $(MANPAGES)
26 cd bin; for bin in `ls | grep -v "^tigr-"` ; do mv "$$bin" tigr-"$$bin" ; done
26 # cd bin; for bin in `ls | grep -v "^tigr-"` ; do mv "$$bin" tigr-"$$bin" ; done
+0
-24
debian/tigr less more
0 #!/bin/sh
1
2 BINDIR=/usr/lib/tigr
3
4 if [ $# -lt 1 ] ; then
5 echo "Usage: $0 <program>" 1>&2
6 echo " Existing programs are:"
7 ls ${BINDIR}
8 exit 1
9 fi
10
11 WRAPPER=$0
12 PROGRAM=$1
13 shift
14 ARGS=$*
15
16 if [ -x ${BINDIR}/${PROGRAM} ]; then
17 exec ${BINDIR}/${PROGRAM} ${ARGS}
18 else
19 echo "Usage: ${PROGRAM} does not exist in Tigr Glimmer"
20 echo " Existing programs are:"
21 ls ${BINDIR}
22 exit 1
23 fi
+0
-124
debian/tigr-anomaly.sgml less more
0 <!doctype refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
1
2 <!-- Process this file with docbook-to-man to generate an nroff manual
3 page: `docbook-to-man manpage.sgml > manpage.1'. You may view
4 the manual page with: `docbook-to-man manpage.sgml | nroff -man |
5 less'. A typical entry in a Makefile or Makefile.am is:
6
7 manpage.1: manpage.sgml
8 docbook-to-man $< > $@
9
10
11 The docbook-to-man binary is found in the docbook-to-man package.
12 Please remember that if you create the nroff version in one of the
13 debian/rules file targets (such as build), you will need to include
14 docbook-to-man in your Build-Depends control field.
15
16 -->
17
18 <!-- Fill in your name for FIRSTNAME and SURNAME. -->
19 <!ENTITY dhfirstname "<firstname>Steffen</firstname>">
20 <!ENTITY dhsurname "<surname>Möller</surname>">
21 <!-- Please adjust the date whenever revising the manpage. -->
22 <!ENTITY dhdate "<date>November 10, 2004</date>">
23 <!ENTITY dhsection "<manvolnum>1</manvolnum>">
24 <!ENTITY dhemail "<email>moeller@pzr.uni-rostock.de</email>">
25 <!ENTITY dhusername "Steffen Moeller">
26 <!ENTITY dhucpackage "<refentrytitle>TIGR-GLIMMER</refentrytitle>">
27 <!ENTITY dhpackage "tigr-glimmer">
28
29 <!ENTITY debian "<productname>Debian</productname>">
30 <!ENTITY gnu "<acronym>GNU</acronym>">
31 <!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
32 ]>
33
34 <refentry>
35 <refentryinfo>
36 <address>
37 &dhemail;
38 </address>
39 <author>
40 &dhfirstname;
41 &dhsurname;
42 </author>
43 <copyright>
44 <year>2003</year>
45 <holder>&dhusername;</holder>
46 </copyright>
47 &dhdate;
48 </refentryinfo>
49 <refmeta>
50 &dhucpackage;
51
52 &dhsection;
53 </refmeta>
54 <refnamediv>
55 <refname>&dhpackage;</refname>
56
57 <refpurpose>
58 The program lacks a description
59 </refpurpose>
60 </refnamediv>
61 <refsynopsisdiv>
62 <cmdsynopsis>
63 <command>tigr-anomaly</command>
64 <arg>>dna-file</arg>
65 <arg>>coord-file</arg>
66 </cmdsynopsis>
67 </refsynopsisdiv>
68 <refsect1>
69 <title>DESCRIPTION</title>
70 <para>
71 </para>
72
73 </refsect1>
74 <refsect1>
75 <title>OPTIONS</title>
76 </refsect1>
77 <refsect1>
78 <title>SEE ALSO</title>
79 <para>
80 tigr-glimmer3 (1),
81 tigr-adjust (1),
82 tigr-anomaly (1),
83 tigr-build-icm (1),
84 tigr-check (1),
85 tigr-codon-usage (1),
86 tigr-compare-lists (1),
87 tigr-extract (1),
88 tigr-generate (1),
89 tigr-get-len (1),
90 tigr-get-putative (1),
91 </para>
92 <para>
93 http://www.tigr.org/software/glimmer/
94 </para>
95
96 <para>Please see the readme in /usr/share/doc/tigr-glimmer for a description on how to use Glimmer3.</para>
97 </refsect1>
98 <refsect1>
99 <title>AUTHOR</title>
100
101 <para>This manual page was quickly copied from the glimmer web site by &dhusername; &dhemail; for
102 the &debian; system.
103 </para>
104
105 </refsect1>
106 </refentry>
107
108 <!-- Keep this comment at the end of the file
109 Local variables:
110 mode: sgml
111 sgml-omittag:t
112 sgml-shorttag:t
113 sgml-minimize-attributes:nil
114 sgml-always-quote-attributes:t
115 sgml-indent-step:2
116 sgml-indent-data:t
117 sgml-parent-document:nil
118 sgml-default-dtd-file:nil
119 sgml-exposed-tags:nil
120 sgml-local-catalogs:nil
121 sgml-local-ecat-files:nil
122 End:
123 -->
+0
-162
debian/tigr-build-icm.sgml less more
0 <!doctype refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
1
2 <!-- Process this file with docbook-to-man to generate an nroff manual
3 page: `docbook-to-man manpage.sgml > manpage.1'. You may view
4 the manual page with: `docbook-to-man manpage.sgml | nroff -man |
5 less'. A typical entry in a Makefile or Makefile.am is:
6
7 manpage.1: manpage.sgml
8 docbook-to-man $< > $@
9
10
11 The docbook-to-man binary is found in the docbook-to-man package.
12 Please remember that if you create the nroff version in one of the
13 debian/rules file targets (such as build), you will need to include
14 docbook-to-man in your Build-Depends control field.
15
16 -->
17
18 <!-- Fill in your name for FIRSTNAME and SURNAME. -->
19 <!ENTITY dhfirstname "<firstname>Steffen</firstname>">
20 <!ENTITY dhsurname "<surname>Möller</surname>">
21 <!-- Please adjust the date whenever revising the manpage. -->
22 <!ENTITY dhdate "<date>Novemver 10, 2004</date>">
23 <!ENTITY dhsection "<manvolnum>1</manvolnum>">
24 <!ENTITY dhemail "<email>moeller@pzr.uni-rostock.de</email>">
25 <!ENTITY dhusername "Steffen Moeller">
26 <!ENTITY dhucpackage "<refentrytitle>TIGR-GLIMMER<title>">
27 <!ENTITY dhpackage "tigr-glimmer">
28
29 <!ENTITY debian "<productname>Debian</productname>">
30 <!ENTITY gnu "<acronym>GNU</acronym>">
31 <!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
32 ]>
33
34 <refentry>
35 <refentryinfo>
36 <address>
37 &dhemail;
38 </address>
39 <author>
40 &dhfirstname;
41 &dhsurname;
42 </author>
43 <copyright>
44 <year>2003</year>
45 <holder>&dhusername;</holder>
46 </copyright>
47 &dhdate;
48 </refentryinfo>
49 <refmeta>
50 &dhucpackage;
51
52 &dhsection;
53 </refmeta>
54 <refnamediv>
55 <refname>&dhpackage;</refname>
56 <refpurpose>Ceates and outputs an interpolated Markov model(IMM)</refpurpose>
57 </refnamediv>
58 <refsynopsisdiv>
59 <cmdsynopsis>
60 <command>tigr-build-icm</command>
61 </cmdsynopsis>
62 </refsynopsisdiv>
63 <refsect1>
64 <title>DESCRIPTION</title>
65 <para>
66 Program build-icm.c creates and outputs an interpolated Markov
67 model (IMM) as described in the paper
68 A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg.
69 Improved Microbial Gene Identification with Glimmer.
70 Nucleic Acids Research, 1999, in press.
71 Please reference this paper if you use the system as part of any
72 published research.
73 </para><para>
74 Input comes from the file named on the command-line. Format should be
75 one string per line. Each line has an ID string followed by white space
76 followed by the sequence itself. The script run-glimmer3 generates
77 an input file in the correct format using the 'extract' program.
78 </para><para>
79 The IMM is constructed as follows: For a given context, say
80 acgtta, we want to estimate the probability distribution of the
81 next character. We shall do this as a linear combination of the
82 observed probability distributions for this context and all of
83 its suffixes, i.e., cgtta, gtta, tta, ta, a and empty. By
84 observed distributions I mean the counts of the number of
85 occurrences of these strings in the training set. The linear
86 combination is determined by a set of probabilities, lambda, one
87 for each context string. For context acgtta the linear combination
88 coefficients are:
89 </para><para>
90 lambda (acgtta)
91 (1 - lambda (acgtta)) x lambda (cgtta)
92 (1 - lambda (acgtta)) x (1 - lambda (cgtta)) x lambda (gtta)
93 (1 - lambda (acgtta)) x (1 - lambda (cgtta)) x (1 - lambda (gtta)) x lambda (tta)
94 (1 - lambda (acgtta)) x (1 - lambda (cgtta)) x (1 - lambda (gtta))
95 x (1 - lambda (tta)) x (1 - lambda (ta)) x (1 - lambda (a))
96 </para><para>
97 We compute the lambda values for each context as follows:
98 - If the number of observations in the training set is &gt;= the constant
99 SAMPLE_SIZE_BOUND, the lambda for that context is 1.0
100 - Otherwise, do a chi-square test on the observations for this context
101 compared to the distribution predicted for the one-character shorter
102 suffix context.
103 If the chi-square significance &lt; 0.5, set the lambda for this context to 0.0
104 Otherwise set the lambda for this context to:
105 (chi-square significance) x (# observations) / SAMPLE_WEIGHT
106 </para><para>
107 To run the program:
108 </para><para>
109 build-icm &lt;train.seq &gt; train.model
110 </para><para>
111 This will use the training data in train.seq to produce the file
112 train.model, containing your IMM.
113 </para>
114 </refsect1>
115 <refsect1>
116 <title>SEE ALSO</title>
117 <para>
118 tigr-glimmer3 (1),
119 tigr-long-orfs (1),
120 tigr-adjust (1),
121 tigr-anomaly (1),
122 tigr-extract (1),
123 tigr-check (1),
124 tigr-codon-usage (1),
125 tigr-compare-lists (1),
126 tigr-extract (1),
127 tigr-generate (1),
128 tigr-get-len (1),
129 tigr-get-putative (1),
130 </para>
131 <para>http://www.tigr.org/software/glimmer/</para>
132 <para>Please see the readme in /usr/share/doc/tigr-glimmer for a description on how to use Glimmer3.</para>
133 </refsect1>
134 <refsect1>
135 <title>AUTHOR</title>
136
137 <para>This manual page was quickly copied from the glimmer web site and readme file by &dhusername; &dhemail; for
138 the &debian; system.
139 </para>
140
141 </refsect1>
142 </refentry>
143
144 <!-- Keep this comment at the end of the file
145 Local variables:
146 mode: sgml
147 sgml-omittag:t
148 sgml-shorttag:t
149 sgml-minimize-attributes:nil
150 sgml-always-quote-attributes:t
151 sgml-indent-step:2
152 sgml-indent-data:t
153 sgml-parent-document:nil
154 sgml-default-dtd-file:nil
155 sgml-exposed-tags:nil
156 sgml-local-catalogs:nil
157 sgml-local-ecat-files:nil
158 End:
159 -->
160
161
+0
-165
debian/tigr-extract.sgml less more
0 <!doctype refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
1
2 <!-- Process this file with docbook-to-man to generate an nroff manual
3 page: `docbook-to-man manpage.sgml > manpage.1'. You may view
4 the manual page with: `docbook-to-man manpage.sgml | nroff -man |
5 less'. A typical entry in a Makefile or Makefile.am is:
6
7 manpage.1: manpage.sgml
8 docbook-to-man $< > $@
9
10
11 The docbook-to-man binary is found in the docbook-to-man package.
12 Please remember that if you create the nroff version in one of the
13 debian/rules file targets (such as build), you will need to include
14 docbook-to-man in your Build-Depends control field.
15
16 -->
17
18 <!-- Fill in your name for FIRSTNAME and SURNAME. -->
19 <!ENTITY dhfirstname "<firstname>Steffen</firstname>">
20 <!ENTITY dhsurname "<surname>Möller</surname>">
21 <!-- Please adjust the date whenever revising the manpage. -->
22 <!ENTITY dhdate "<date>November 10, 2004</date>">
23 <!ENTITY dhsection "<manvolnum>1</manvolnum>">
24 <!ENTITY dhemail "<email>moeller@pzr.uni-rostock.de</email>">
25 <!ENTITY dhusername "Steffen Moeller">
26 <!ENTITY dhucpackage "<refentrytitle>TIGR-GLIMMER</refentrytitle>">
27 <!ENTITY dhpackage "tigr-glimmer">
28
29 <!ENTITY debian "<productname>Debian</productname>">
30 <!ENTITY gnu "<acronym>GNU</acronym>">
31 <!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
32 ]>
33
34 <refentry>
35 <refentryinfo>
36 <address>
37 &dhemail;
38 </address>
39 <author>
40 &dhfirstname;
41 &dhsurname;
42 </author>
43 <copyright>
44 <year>2003</year>
45 <holder>&dhusername;</holder>
46 </copyright>
47 &dhdate;
48 </refentryinfo>
49 <refmeta>
50 &dhucpackage;
51
52 &dhsection;
53 </refmeta>
54 <refnamediv>
55 <refname>&dhpackage;</refname>
56
57 <refpurpose>
58 Fine start/stop positions of genes in genome sequence
59 </refpurpose>
60 </refnamediv>
61 <refsynopsisdiv>
62 <cmdsynopsis>
63 <command>tigr-extract</command>
64 <arg>genome-file <option><replaceable>options</replaceable></option></arg>
65 </cmdsynopsis>
66 </refsynopsisdiv>
67 <refsect1>
68 <title>DESCRIPTION</title>
69 <para>
70 Program extract takes a FASTA format sequence file and a file
71 with a list of start/stop positions in that file (e.g., as produced
72 by the long-orfs program) and extracts and outputs the
73 specified sequences.
74 </para><para>
75 The first command-line argument is the name of the sequence file,
76 which must be in FASTA format.
77 </para><para>
78 The second command-line argument is the name of the coordinate file.
79 It must contain a list of pairs of positions in the first file, one
80 per line. The format of each entry is:
81 </para><para> &lt;IDstring>&gt; &lt;start position> &lt;stop position&gt;
82 </para><para>This file should contain no other information, so if you're using
83 the output of glimmer or long-orfs , you'll have to cut off
84 header lines.
85 </para><para>
86 The output of the program goes to the standard output and has one
87 line for each line in the coordinate file. Each line contains
88 the IDstring , followed by white space, followed by the substring
89 of the sequence file specified by the coordinate pair. Specifically,
90 the substring starts at the first position of the pair and ends at
91 the second position (inclusive). If the first position is bigger
92 than the second, then the DNA reverse complement of each position
93 is generated. Start/stop pairs that "wrap around" the end of the
94 genome are allowed.
95 </para>
96 </refsect1>
97 <refsect1>
98 <title>OPTIONS</title>
99 <variablelist>
100 <varlistentry>
101 <term><option>-skip</option></term>
102 <listitem>
103 <para> makes the output omit the first 3 characters of each sequence, i.e., it skips over the start codon. This was the behaviour of the previous version of the program.</para>
104 </listitem>
105 </varlistentry>
106 <varlistentry>
107 <term><option>-l</option></term><listitem><para>
108 makes the output omit an sequences shorter than n characters.
109 n includes the 3 skipped characters if the -skip switch
110 is one.
111 </para></listitem>
112 </varlistentry>
113 </variablelist>
114 </refsect1>
115 <refsect1>
116 <title>SEE ALSO</title>
117 <para>
118 tigr-glimmer3 (1),
119 tigr-long-orfs (1),
120 tigr-adjust (1),
121 tigr-anomaly (1),
122 tigr-build-icm (1),
123 tigr-check (1),
124 tigr-codon-usage (1),
125 tigr-compare-lists (1),
126 tigr-extract (1),
127 tigr-generate (1),
128 tigr-get-len (1),
129 tigr-get-putative (1),
130 </para>
131 <para>
132 http://www.tigr.org/software/glimmer/
133 </para>
134
135 <para>Please see the readme in /usr/share/doc/tigr-glimmer for a description on how to use Glimmer3.</para>
136 </refsect1>
137 <refsect1>
138 <title>AUTHOR</title>
139
140 <para>This manual page was quickly copied from the glimmer web site by &dhusername; &dhemail; for
141 the &debian; system.
142 </para>
143
144 </refsect1>
145 </refentry>
146
147 <!-- Keep this comment at the end of the file
148 Local variables:
149 mode: sgml
150 sgml-omittag:t
151 sgml-shorttag:t
152 sgml-minimize-attributes:nil
153 sgml-always-quote-attributes:t
154 sgml-indent-step:2
155 sgml-indent-data:t
156 sgml-parent-document:nil
157 sgml-default-dtd-file:nil
158 sgml-exposed-tags:nil
159 sgml-local-catalogs:nil
160 sgml-local-ecat-files:nil
161 End:
162 -->
163
164
+0
-246
debian/tigr-glimmer3.sgml less more
0 <!doctype refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
1
2 <!-- Process this file with docbook-to-man to generate an nroff manual
3 page: `docbook-to-man manpage.sgml > manpage.1'. You may view
4 the manual page with: `docbook-to-man manpage.sgml | nroff -man |
5 less'. A typical entry in a Makefile or Makefile.am is:
6
7 manpage.1: manpage.sgml
8 docbook-to-man $< > $@
9
10
11 The docbook-to-man binary is found in the docbook-to-man package.
12 Please remember that if you create the nroff version in one of the
13 debian/rules file targets (such as build), you will need to include
14 docbook-to-man in your Build-Depends control field.
15
16 -->
17
18 <!-- Fill in your name for FIRSTNAME and SURNAME. -->
19 <!ENTITY dhfirstname "<firstname>Steffen</firstname>">
20 <!ENTITY dhsurname "<surname>Möller</surname>">
21 <!-- Please adjust the date whenever revising the manpage. -->
22 <!ENTITY dhdate "<date>November 10, 2004</date>">
23 <!ENTITY dhsection "<manvolnum>1</manvolnum>">
24 <!ENTITY dhemail "<email>moeller@pzr.uni-rostock.de</email>">
25 <!ENTITY dhusername "Steffen Moeller">
26 <!ENTITY dhucpackage "<refentrytitle>TIGR-GLIMMER</refentrytitle>">
27 <!ENTITY dhpackage "tigr-glimmer">
28
29 <!ENTITY debian "<productname>Debian</productname>">
30 <!ENTITY gnu "<acronym>GNU</acronym>">
31 <!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
32 ]>
33
34 <refentry>
35 <refentryinfo>
36 <address>
37 &dhemail;
38 </address>
39 <author>
40 &dhfirstname;
41 &dhsurname;
42 </author>
43 <copyright>
44 <year>2003</year>
45 <holder>&dhusername;</holder>
46 </copyright>
47 &dhdate;
48 </refentryinfo>
49 <refmeta>
50 &dhucpackage;
51
52 &dhsection;
53 </refmeta>
54 <refnamediv>
55 <refname>&dhpackage;</refname>
56 <refpurpose>
57 Find/Score potential genes in genome-file using the probability model in icm-file
58 </refpurpose>
59 </refnamediv>
60 <refsynopsisdiv>
61 <cmdsynopsis>
62 <command>tigr-glimmer3</command>
63 <arg><option><replaceable>genome-file</replaceable></option></arg>
64 <arg><option><replaceable>icm-file</replaceable></option></arg>
65 <arg><option><replaceable>[options]</replaceable></option></arg>
66 </cmdsynopsis>
67 </refsynopsisdiv>
68 <refsect1>
69 <title>DESCRIPTION</title>
70 <para>
71 <command>&dhpackage;</command> is a system for finding genes in microbial DNA, especially the genomes of bacteria and archaea. <command>&dhpackage;</command> (Gene Locator and Interpolated Markov Modeler) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA. The IMM approach, described in our Nucleic Acids Research paper on <command>&dhpackage;</command> 1.0 and in our subsequent paper on <command>&dhpackage;</command> 2.0, uses a combination of Markov models from 1st through 8th-order, weighting each model according to its predictive power. <command>&dhpackage;</command> 1.0 and 2.0 use 3-periodic nonhomogenous Markov models in their IMMs.
72 </para><para>
73 <command>&dhpackage;</command> is the primary microbial gene finder at TIGR, and has been used to annotate the complete genomes of B. burgdorferi (Fraser et al., Nature, Dec. 1997), T. pallidum (Fraser et al., Science, July 1998), T. maritima, D. radiodurans, M. tuberculosis, and non-TIGR projects including C. trachomatis, C. pneumoniae, and others. Its analyses of some of these genomes and others is available at the TIGR microbial database site.
74 </para><para>
75 A special version of <command>&dhpackage;</command> designed for small eukaryotes, GlimmerM, was used to find the genes in chromosome 2 of the malaria parasite, P. falciparum.. GlimmerM is described in S.L. Salzberg, M. Pertea, A.L. Delcher, M.J. Gardner, and H. Tettelin, "Interpolated Markov models for eukaryotic gene finding," Genomics 59 (1999), 24-31. Click here (http://www.tigr.org/software/glimmerm/) to visit the GlimmerM site, which includes information on how to download the GlimmerM system.
76 </para><para>
77 The <command>&dhpackage;</command> system consists of two main programs. The first of these is the training program, build-imm. This program takes an input set of sequences and builds and outputs the IMM for them. These sequences can be complete genes or just partial orfs. For a new genome, this training data can consist of those genes with strong database hits as well as very long open reading frames that are statistically almost certain to be genes. The second program is glimmer, which uses this IMM to identify putative genes in an entire genome. <command>&dhpackage;</command> automatically resolves conflicts between most overlapping genes by choosing one of them. It also identifies genes that are suspected to truly overlap, and flags these for closer inspection by the user. These ``suspect'' gene candidates have been a very small percentage of the total for all the genomes analyzed thus far.
78 <command>&dhpackage;</command> is a program that...</para>
79 </refsect1>
80 <refsect1>
81 <title>OPTIONS</title>
82 <variablelist>
83 <varlistentry>
84 <term><option>-C <replaceable>n</replaceable></option></term>
85 <listitem>
86 <para>Use n as GC percentage of independent model</para>
87 <para>Note: n should be a percentage, e.g., -C 45.2</para>
88 </listitem>
89 </varlistentry>
90 <varlistentry>
91 <term>-f</term><listitem><para>Use ribosome-binding energy to choose start codon</para></listitem>
92 </varlistentry>
93 <varlistentry>
94 <term><option>+f</option></term><listitem><para>Use first codon in orf as start codon</para></listitem>
95 </varlistentry>
96 <varlistentry>
97 <term><option>-g <replaceable>n</replaceable></option></term><listitem><para>Set minimum gene length to n</para></listitem>
98 </varlistentry>
99 <varlistentry>
100 <term><option>-i <replaceable>filename</replaceable></option></term>
101 <listitem>
102 <para>Use <option><replaceable>filename</replaceable></option>
103 to select regions of bases that are off
104 limits, so that no bases within that area will be examined
105 </para>
106 </listitem>
107 </varlistentry>
108 <varlistentry>
109 <term><option>-l</option></term>
110 <listitem><para>Assume linear rather than circular genome, i.e., no wraparound</para></listitem>
111 </varlistentry>
112 <varlistentry>
113 <term><option>-L <replaceable>filename</replaceable></option></term>
114 <listitem><para>Use filename to specify a list of orfs that should
115 be scored separately, with no overlap rules
116 </para></listitem>
117 </varlistentry>
118 <varlistentry>
119 <term><option>-M</option></term>
120 <listitem><para>Input is a multifasta file of separate genes to be scored
121 separately, with no overlap rules
122 </para>
123 </listitem>
124 </varlistentry>
125 <varlistentry>
126 <term><option>-o <replaceable>n</replaceable></option></term>
127 <listitem>
128 <para>Set minimum overlap length to n. Overlaps shorter than this
129 are ignored.
130 </para></listitem>
131 </varlistentry>
132 <varlistentry>
133 <term><option>-p <replaceable>n</replaceable></option></term>
134 <listitem>
135 <para>
136 Set minimum overlap percentage to n%. Overlaps shorter than this percentage of *both* strings are ignored.
137 </para>
138 </listitem>
139 </varlistentry>
140 <varlistentry>
141 <term><option>-q <replaceable>n</replaceable></option></term>
142 <listitem>
143 <para>Set the maximum length orf that can be rejected because of
144 the independent probability score column to (n - 1)
145 </para>
146 </listitem>
147 </varlistentry>
148 <varlistentry>
149 <term><option>-r</option></term>
150 <listitem>
151 <para>
152 Don't use independent probability score column
153 </para>
154 </listitem>
155 </varlistentry>
156 <varlistentry>
157 <term><option>+r</option></term>
158 <listitem><para>
159 Use independent probability score column
160 </para>
161 </listitem>
162 </varlistentry>
163 <varlistentry>
164 <term><option>-r</option></term>
165 <listitem>
166 <para>
167 Don't use independent probability score column
168 </para> </listitem> </varlistentry> <varlistentry>
169 <term><option>-s <replaceable>s</replaceable></option></term>
170 <listitem><para> Use string s as the ribosome binding pattern to find start codons.</para>
171 </listitem>
172 </varlistentry>
173 <varlistentry>
174 <term><option>+S</option></term>
175 <listitem>
176 <para>
177 Do use stricter independent intergenic model that doesn't
178 give probabilities to in-frame stop codons. (Option is obsolete
179 since this is now the only behaviour
180 </para> </listitem>
181 </varlistentry>
182 <varlistentry>
183 <term><option>-t <replaceable>n</replaceable></option></term>
184 <listitem><para>
185 Set threshold score for calling as gene to n. If the in-frame
186 score >= n, then the region is given a number and considered
187 a potential gene.
188 </para> </listitem>
189 </varlistentry>
190 <varlistentry>
191 <term><option>-w <replaceable>n</replaceable> </option></term>
192 <listitem><para>
193 Use "weak" scores on tentative genes n or longer. Weak
194 scores ignore the independent probability score.
195 </para></listitem>
196 </varlistentry>
197 </variablelist>
198 </refsect1>
199 <refsect1>
200 <title>SEE ALSO</title>
201 <para>
202 tigr-adjust (1),
203 tigr-anomaly (1),
204 tigr-build-icm (1),
205 tigr-check (1),
206 tigr-codon-usage (1),
207 tigr-compare-lists (1),
208 tigr-extract (1),
209 tigr-generate (1),
210 tigr-get-len (1),
211 tigr-get-putative (1),
212 tigr-glimmer3 (1),
213 tigr-long-orfs (1)
214 </para>
215 <para>
216 http://www.tigr.org/software/glimmer/
217 </para>
218 <para>Please see the readme in /usr/share/doc/glimmer for a description on how to use Glimmer.</para>
219 </refsect1>
220 <refsect1>
221 <title>AUTHOR</title>
222 <para>This manual page was quickly copied from the glimmer web site by &dhusername; &dhemail; for
223 the &debian; system.
224 </para>
225 </refsect1>
226 </refentry>
227
228 <!-- Keep this comment at the end of the file
229 Local variables:
230 mode: sgml
231 sgml-omittag:t
232 sgml-shorttag:t
233 sgml-minimize-attributes:nil
234 sgml-always-quote-attributes:t
235 sgml-indent-step:2
236 sgml-indent-data:t
237 sgml-parent-document:nil
238 sgml-default-dtd-file:nil
239 sgml-exposed-tags:nil
240 sgml-local-catalogs:nil
241 sgml-local-ecat-files:nil
242 End:
243 -->
244
245
+0
-238
debian/tigr-long-orfs.sgml less more
0 <!doctype refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
1
2 <!-- Process this file with docbook-to-man to generate an nroff manual
3 page: `docbook-to-man manpage.sgml > manpage.1'. You may view
4 the manual page with: `docbook-to-man manpage.sgml | nroff -man |
5 less'. A typical entry in a Makefile or Makefile.am is:
6
7 manpage.1: manpage.sgml
8 docbook-to-man $< > $@
9
10
11 The docbook-to-man binary is found in the docbook-to-man package.
12 Please remember that if you create the nroff version in one of the
13 debian/rules file targets (such as build), you will need to include
14 docbook-to-man in your Build-Depends control field.
15
16 -->
17
18 <!-- Fill in your name for FIRSTNAME and SURNAME. -->
19 <!ENTITY dhfirstname "<firstname>Steffen</firstname>">
20 <!ENTITY dhsurname "<surname>Möller</surname>">
21 <!-- Please adjust the date whenever revising the manpage. -->
22 <!ENTITY dhdate "<date>November 10, 2004</date>">
23 <!ENTITY dhsection "<manvolnum>1</manvolnum>">
24 <!ENTITY dhemail "<email>moeller@pzr.uni-rostock.de</email>">
25 <!ENTITY dhusername "Steffen Moeller">
26 <!ENTITY dhucpackage "<refentrytitle>LONG-ORFS</refentrytitle>">
27 <!ENTITY dhpackage "long-orfs">
28
29 <!ENTITY debian "<productname>Debian</productname>">
30 <!ENTITY gnu "<acronym>GNU</acronym>">
31 <!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
32 ]>
33
34 <refentry>
35 <refentryinfo>
36 <address>
37 &dhemail;
38 </address>
39 <author>
40 &dhfirstname;
41 &dhsurname;
42 </author>
43 <copyright>
44 <year>2003</year>
45 <holder>&dhusername;</holder>
46 </copyright>
47 &dhdate;
48 </refentryinfo>
49 <refmeta>
50 &dhucpackage;
51
52 &dhsection;
53 </refmeta>
54 <refnamediv>
55 <refname>&dhpackage;</refname>
56
57 <refpurpose>
58 Find/Score potential genes in genome-file using
59 the probability model in icm-file
60 </refpurpose>
61 </refnamediv>
62 <refsynopsisdiv>
63 <cmdsynopsis>
64 <command>tigr-long-orgs</command>
65 <arg>genome-file <option><replaceable>options</replaceable></option></arg>
66 </cmdsynopsis>
67 </refsynopsisdiv>
68 <refsect1>
69 <title>DESCRIPTION</title>
70 <para>
71 Program long-orfs takes a sequence file (in FASTA format) and
72 outputs a list of all long "potential genes" in it that do not
73 overlap by too much. By "potential gene" I mean the portion of
74 an orf from the first start codon to the stop codon at the end.
75 </para><para>
76 The first few lines of output specify the settings of various
77 parameters in the program:
78 </para><para>
79 Minimum gene length is the length of the smallest fragment
80 considered to be a gene. The length is measured from the first base
81 of the start codon to the last base *before* the stop codon.
82 This value can be specified when running the program with the -g option.
83 By default, the program now (April 2003) will compute an optimal length
84 for this parameter, where "optimal" is the value that produces the
85 greatest number of long ORFs, thereby increasing the amount of data
86 used for training.
87 </para><para>
88 Minimum overlap length is a lower bound on the number of bases overlap
89 between 2 genes that is considered a problem. Overlaps shorter than
90 this are ignored.
91 </para><para>
92 Minimum overlap percent is another lower bound on the number of bases
93 overlap that is considered a problem. Overlaps shorter than this
94 percentage of *both* genes are ignored.
95 </para><para>
96 The next portion of the output is a list of potential genes:
97 </para><para>
98 Column 1 is an ID number for reference purposes. It is assigned
99 sequentially starting with 1 to all long potential genes. If
100 overlapping genes are eliminated, gaps in the numbers will occur.
101 The ID prefix is specified in the constant ID_PREFIX .
102 </para><para>
103 Column 2 is the position of the first base of the first start codon in
104 the orf. Currently I use atg, and gtg as start codons. This is
105 easily changed in the function Is_Start () .
106 </para><para>
107 Column 3 is the position of the last base *before* the stop codon. Stop
108 codons are taa, tag, and tga. Note that for orfs in the reverse
109 reading frames have their start position higher than the end position.
110 The order in which orfs are listed is in increasing order by
111 Max {OrfStart, End}, i.e., the highest numbered position in the orf,
112 except for orfs that "wrap around" the end of the sequence.
113 </para><para>
114 When two genes with ID numbers overlap by at least a sufficient
115 amount (as determined by Min_Olap and Min_Olap_Percent ), they
116 are eliminated and do not appear in the output.
117 </para><para>
118 The final output of the program (sent to the standard error file so
119 it does not show up when output is redirected to a file) is the
120 length of the longest orf found.
121 </para><para>
122
123
124 Specifying Different Start and Stop Codons:
125 </para><para>
126 To specify different sets of start and stop codons, modify the file
127 gene.h . Specifically, the functions:
128 </para><para>
129 Is_Forward_Start Is_Reverse_Start Is_Start
130 Is_Forward_Stop Is_Reverse_Stop Is_Stop
131 </para><para>
132 are used to determine what is used for start and stop codons.
133 </para><para>
134 Is_Start and Is_Stop do simple string comparisons to specify
135 which patterns are used. To add a new pattern, just add the comparison
136 for it. To remove a pattern, comment out or delete the comparison
137 for it.
138 </para><para>
139 The other four functions use a bit comparison to determine start and
140 stop patterns. They represent a codon as a 12-bit pattern, with 4 bits
141 for each base, one bit for each possible value of the bases, T, G, C
142 or A. Thus the bit pattern 0010 0101 1100 represents the base
143 pattern [C] [A or G] [G or T]. By doing bit operations (& | ~) and
144 comparisons, more complicated patterns involving ambiguous reads
145 can be tested efficiently. Simple patterns can be tested as in
146 the current code.
147 </para><para>
148 For example, to insert an additional start codon of CAT requires 3 changes:
149 1. The line
150 || (Codon & 0x218) == Codon
151 should be inserted into Is_Forward_Start , since 0x218 = 0010 0001 1000
152 represents CAT.
153 2. The line
154 || (Codon & 0x184) == Codon
155 should be inserted into Is_Reverse_Start , since 0x184 = 0001 1000 0100
156 represents ATG, which is the reverse-complement of CAT. Alternately,
157 the #define constant ATG_MASK could be used.
158 3. The line
159 || strncmp (S, "cat", 3) == 0
160 should be inserted into Is_Start .
161 </para>
162
163 </refsect1>
164 <refsect1>
165 <title>OPTIONS</title>
166 <variablelist>
167 <varlistentry>
168 <term><option>-g <replaceable>n</replaceable></option></term>
169 <listitem>
170 <para> Set minimum gene length to n. Default is to compute an
171 optimal value automatically. Don't change this unless you
172 know what you're doing.</para>
173 </listitem>
174 </varlistentry>
175 <varlistentry>
176 <term><option>-l</option></term><listitem><para>Regard the genome as linear (not circular), i.e., do not allow
177 genes to "wrap around" the end of the genome.
178 This option works on both glimmer and long-orfs .
179 The default behavior is to regard the genome as circular.</para></listitem>
180 </varlistentry>
181 <varlistentry>
182 <term><option>-o <replaceable>n</replaceable></option></term><listitem><para>Set maximum overlap length to n. Overlaps shorter than this
183 are permitted. (Default is 0 bp.)</para></listitem>
184 </varlistentry>
185 <varlistentry>
186 <term><option>-p <replaceable>n</replaceable></option></term><listitem><para>Set maximum overlap percentage to n%. Overlaps shorter than
187 this percentage of *both* strings are ignored. (Default is 10%.)</para></listitem>
188 </varlistentry>
189 </variablelist>
190 </refsect1>
191 <refsect1>
192 <title>SEE ALSO</title>
193 <para>
194 tigr-glimmer3 (1),
195 tigr-adjust (1),
196 tigr-anomaly (1),
197 tigr-build-icm (1),
198 tigr-check (1),
199 tigr-codon-usage (1),
200 tigr-compare-lists (1),
201 tigr-extract (1),
202 tigr-generate (1),
203 tigr-get-len (1),
204 tigr-get-putative (1),
205 </para>
206 <para>
207 http://www.tigr.org/software/glimmer/
208 </para>
209
210 <para>Please see the readme in /usr/share/doc/tigr-glimmer for a description on how to use Glimmer3.</para>
211 </refsect1>
212 <refsect1>
213 <title>AUTHOR</title>
214
215 <para>This manual page was quickly copied from the glimmer web site by &dhusername; &dhemail; for
216 the &debian; system.
217 </para>
218
219 </refsect1>
220 </refentry>
221
222 <!-- Keep this comment at the end of the file
223 Local variables:
224 mode: sgml
225 sgml-omittag:t
226 sgml-shorttag:t
227 sgml-minimize-attributes:nil
228 sgml-always-quote-attributes:t
229 sgml-indent-step:2
230 sgml-indent-data:t
231 sgml-parent-document:nil
232 sgml-default-dtd-file:nil
233 sgml-exposed-tags:nil
234 sgml-local-catalogs:nil
235 sgml-local-ecat-files:nil
236 End:
237 -->
+0
-120
debian/tigr-run-glimmer3.sgml less more
0 <!doctype refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
1
2 <!-- Process this file with docbook-to-man to generate an nroff manual
3 page: `docbook-to-man manpage.sgml > manpage.1'. You may view
4 the manual page with: `docbook-to-man manpage.sgml | nroff -man |
5 less'. A typical entry in a Makefile or Makefile.am is:
6
7 manpage.1: manpage.sgml
8 docbook-to-man $< > $@
9
10
11 The docbook-to-man binary is found in the docbook-to-man package.
12 Please remember that if you create the nroff version in one of the
13 debian/rules file targets (such as build), you will need to include
14 docbook-to-man in your Build-Depends control field.
15
16 -->
17
18 <!-- Fill in your name for FIRSTNAME and SURNAME. -->
19 <!ENTITY dhfirstname "<firstname>Steffen</firstname>">
20 <!ENTITY dhsurname "<surname>Möller</surname>">
21 <!-- Please adjust the date whenever revising the manpage. -->
22 <!ENTITY dhdate "<date>November 10, 2004</date>">
23 <!ENTITY dhsection "<manvolnum>1</manvolnum>">
24 <!ENTITY dhemail "<email>moeller@pzr.uni-rostock.de</email>">
25 <!ENTITY dhusername "Steffen Moeller">
26 <!ENTITY dhucpackage "<refentrytitle>TIGR-GLIMMER</refentrytitle>">
27 <!ENTITY dhpackage "tigr-glimmer">
28
29 <!ENTITY debian "<productname>Debian</productname>">
30 <!ENTITY gnu "<acronym>GNU</acronym>">
31 <!ENTITY gpl "&gnu; <acronym>GPL</acronym>">
32 ]>
33
34 <refentry>
35 <refentryinfo>
36 <address>
37 &dhemail;
38 </address>
39 <author>
40 &dhfirstname;
41 &dhsurname;
42 </author>
43 <copyright>
44 <year>2003</year>
45 <holder>&dhusername;</holder>
46 </copyright>
47 &dhdate;
48 </refentryinfo>
49 <refmeta>
50 &dhucpackage;
51
52 &dhsection;
53 </refmeta>
54 <refnamediv>
55 <refname>&dhpackage;</refname>
56
57 <refpurpose>
58 Apply the suite of programs within glimmer3 to a a prokaryotic or archean genome.
59 </refpurpose>
60 </refnamediv>
61 <refsynopsisdiv>
62 <cmdsynopsis>
63 <command>tigr-run-glimmer3</command>
64 </cmdsynopsis>
65 </refsynopsisdiv>
66 <refsect1>
67 <title>DESCRIPTION</title>
68 <para>
69 A shell script that wraps a set of tigr-* utilities of the glimmer package to retrieve coding regions.
70 </para>
71 </refsect1>
72 <refsect1>
73 <title>SEE ALSO</title>
74 <para>
75 tigr-glimmer3 (1),
76 tigr-adjust (1),
77 tigr-anomaly (1),
78 tigr-build-icm (1),
79 tigr-check (1),
80 tigr-codon-usage (1),
81 tigr-compare-lists (1),
82 tigr-extract (1),
83 tigr-generate (1),
84 tigr-get-len (1),
85 tigr-get-putative (1),
86 tigr-long-orfs (1),
87 </para>
88 <para>
89 http://www.tigr.org/software/glimmer/
90 </para>
91
92 <para>Please see the readme in /usr/share/doc/tigr-glimmer for a description on how to use Glimmer3.</para>
93 </refsect1>
94 <refsect1>
95 <title>AUTHOR</title>
96
97 <para>This manual page was quickly copied from the glimmer web site by &dhusername; &dhemail; for
98 the &debian; system.
99 </para>
100
101 </refsect1>
102 </refentry>
103
104 <!-- Keep this comment at the end of the file
105 Local variables:
106 mode: sgml
107 sgml-omittag:t
108 sgml-shorttag:t
109 sgml-minimize-attributes:nil
110 sgml-always-quote-attributes:t
111 sgml-indent-step:2
112 sgml-indent-data:t
113 sgml-parent-document:nil
114 sgml-default-dtd-file:nil
115 sgml-exposed-tags:nil
116 sgml-local-catalogs:nil
117 sgml-local-ecat-files:nil
118 End:
119 -->
+0
-37
debian/tigr.1 less more
0 .TH TIGR 1 "April 16, 2008"
1 .SH NAME
2 tigr \- runs various programs of the TIGR Glimmer suite
3 .SH SYNOPSIS
4 .B tigr
5 .B program
6 [arguments]
7 .SH DESCRIPTION
8 This manual page documents briefly the
9 .B tigr
10 wrapper to the TIGR Glimmer programs.
11 This manual page was written for the Debian GNU/Linux distribution
12 because upstream does not provide this wrapper and it was invented
13 for Debian to avoid conflicts with other packages that might cause
14 a name space polution.
15 .PP
16 \fBtigr\fP is just a wrapper that invokes the various programs in the
17 TIGR Glimmer software package. You can get more detailed documentation
18 in /usr/share/doc/tigr-glimmer where
19 you find one helpfile for each program.
20 .PP
21 The following programs are included: clique, contrast, dnainvar, dnamove, dollop,
22 drawgram, fitch, mix, penny, restml, consense, dnacomp, dnaml, dnapars, dolmove, drawtree,
23 gendist, move, protdist, retree, contml, dnadist, dnamlk, dnapenny, dolpenny, factor,
24 kitsch, neighbor, protpars, and seqboot.
25 .SH OPTIONS
26 There are no options.
27 .SH EXAMPLES
28 .IP phylip\ dnapenny
29 .IP phylip\ factor
30 .SH SEE ALSO
31 .BR treetool (1),
32 .br
33 .SH AUTHOR
34 This manual page was written by Stephane Bortzmeyer <bortzmeyer@debian.org>
35 and Dr. Guenter Bechly <gbechly@debian.org>, for the Debian GNU/Linux system
36 (but may be used by others).