Codebase list tigr-glimmer / b35cc01 debian / glimmer2_mans / tigr-build-icm.1
b35cc01

Tree @b35cc01 (Download .tar.gz)

tigr-build-icm.1 @b35cc01raw · history · blame

.TH "TIGR-GLIMMER     \fB(1)\fP   " "1" 
.SH "NAME" 
tigr-glimmer \(em Creates and outputs an interpolated Markov model(IMM) 
.SH "SYNOPSIS" 
.PP 
\fBtigr-build-icm\fR 
.SH "DESCRIPTION" 
.PP 
Program  build-icm.c  creates and outputs an interpolated Markov 
model (IMM) as described in the paper 
A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg. 
Improved Microbial Gene Identification with Glimmer. 
Nucleic Acids Research, 1999, in press. 
Please reference this paper if you use the system as part of any 
published research. 
.PP 
Input comes from the file named on the command-line.  Format should be 
one string per line.  Each line has an ID string followed by white space 
followed by the sequence itself.  The script run-glimmer3 generates 
an input file in the correct format using the 'extract' program. 
.PP 
The IMM is constructed as follows: For a given context, say 
acgtta, we want to estimate the probability distribution of the 
next character.  We shall do this as a linear combination of the 
observed probability distributions for this context and all of 
its suffixes, i.e., cgtta, gtta, tta, ta, a and empty.  By 
observed distributions I mean the counts of the number of 
occurrences of these strings in the training set.  The linear 
combination is determined by a set of probabilities, lambda, one 
for each context string.  For context acgtta the linear combination 
coefficients are: 
.PP 
lambda (acgtta) 
(1 \- lambda (acgtta)) x lambda (cgtta) 
(1 \- lambda (acgtta)) x (1 \- lambda (cgtta)) x lambda (gtta) 
(1 \- lambda (acgtta)) x (1 \- lambda (cgtta)) x (1 \- lambda (gtta)) x lambda (tta) 
(1 \- lambda (acgtta)) x (1 \- lambda (cgtta)) x (1 \- lambda (gtta)) 
x (1 \- lambda (tta))  x (1 \- lambda (ta))  x (1 \- lambda (a)) 
.PP 
We compute the lambda values for each context as follows: 
\- If the number of observations in the training set is >= the constant 
SAMPLE_SIZE_BOUND, the lambda for that context is 1.0 
\- Otherwise, do a chi-square test on the observations for this context 
compared to the distribution predicted for the one-character shorter 
suffix context. 
If the chi-square significance < 0.5, set the lambda for this context to 0.0 
Otherwise set the lambda for this context to: 
(chi-square significance) x (# observations) / SAMPLE_WEIGHT 
.PP 
To run the program: 
.PP 
build-icm <train.seq > train.model 
.PP 
This will use the training data in train.seq to produce the file 
train.model, containing your IMM. 
.SH "SEE ALSO" 
.PP 
tigr-glimmer3 (1), 
tigr-long-orfs (1), 
tigr-adjust (1), 
tigr-anomaly	(1), 
tigr-extract (1), 
tigr-check (1), 
tigr-codon-usage (1), 
tigr-compare-lists (1), 
tigr-extract (1), 
tigr-generate (1), 
tigr-get-len (1), 
tigr-get-putative (1), 
 
.PP 
http://www.tigr.org/software/glimmer/ 
.PP 
Please see the readme in /usr/share/doc/tigr-glimmer for a description on how to use Glimmer3. 
.SH "AUTHOR" 
.PP 
This manual page was quickly copied from the glimmer web site and readme file by Steffen Moeller moeller@debian.org for 
the \fBDebian\fP system. 
 
.\" created by instant / docbook-to-man