Codebase list codonw / lintian-fixes/main Recoding.txt
lintian-fixes/main

Tree @lintian-fixes/main (Download .tar.gz)

Recoding.txt @lintian-fixes/mainraw · history · blame

Data Recoding
To add computation codonW converts sequence information 
automatically from it original text format into a numerical format. 
This is normally transparent to the user. To add additional genetic 
codes or a personal choice of codon values for calculating the Fop, 
CAI or CBI indices, some understanding of the schema used to convert 
the sequences to numerical strings is advisable. 

When calculating the indices Fop, CBI, or CAI which are measure of 
codon bias in relation to the codon usage of a set of optimal genes, 
there is an option of using a personal choice of these values. These 
are read from file, there must be one value for each codon (64 in 
total) and they must be found in the file in a set sequence (i.e. 
the numerical order of the codons, TTT, TCT ... GAG, GGG). This is 
also the order in which codon and amino acid results are recorded to 
file.

Internally CodonW recodes all nucleotides, codons and amino acids. 
Nucleotides are recoded as T/U=1, C=2, A=3, G=4. The 20 standard 
amino acids and the termination codons are recoded as integer values 
in the range 1 to 21, note that stop codons is assigned the amino 
acid value 11 (see Table 2). The decision about whether a codon is 
synonymous, or how many members are in a particular amino acid 
synonymous family are taken at run time and are dependent on the 
genetic code chosen.  

Each codon is recoded into an integer value in the range 1 to 64, 
see Table 1. The formulae used to recode the codons is:

Equation 1
        	
code=((p1-1)*16)+P2+((p3-1)*4)    1<= code <= 64

Where each of the three codon positions is represented by P1, P2 and 
P3. Using this recoding convention, the codon ATG has the value 45. 
 		
code=((3-1)*16)+1+((4-1)*4)=45

Unrecognised or non-translatable bases, codons or amino acids are 
represented all assigned the value zero.




Table 1 Numerical values used for recoding codons 

Code	Codon	AA	Code	Codon	AA	Code	Codon	AA	Code	Codon	AA
1	UUU 	Phe	2	UCU 	Ser	3	UAU 	Tyr	4	UGU 	Cys
5	UUC		6	UCC 		7	UAC		8	UGC 	
9	UUA 	Leu	10	UCA		11	UAA 	STOP	12	UGA 	STOP
13	UUG		14	UCG 		15	UAG 		16	UGG 	Trp
17	CUU		18	CCU 	Pro	19	CAU 	His	20	CGU	Arg
21	CUC		22	CCC 		23	CAC		24	CGC	
25	CUA 		26	CCA		27	CAA	Gln	28	CGA 	
29	CUG 		30	CCG 		31	CAG 		32	CGG 	
33	AUU 	Ile	34	ACU 	Thr	35	AAU 	Asn	36	AGU 	Ser
37	AUC		38	ACC 		39	AAC		40	AGC 	
41	AUA 		42	ACA 		43	AAA	Lys	44	AGA 	Arg
45	AUG 	Met	46	ACG 		47	AAG 		48	AGG 	
49	GUU	Val	50	GCU	Ala	51	GAU 	Asp	52	GGU	Gly
53	GUC 		54	GCC 		55	GAC		56	GGC 	
57	GUA 		58	GCA 		59	GAA	Glu	60	GGA 	
61	GUG 		62	GCG 		63	GAG 		64	GGG 	



Table 2 Numerical values used to recode amino acids.
Code	AA	One letter code	Code	AA	One letter code
1	Phe	F	2	Leu	L
3	Ile	I	4	Met	M
5	Val	V	6	Ser	S
7	Pro	P	8	Thr	T
9	Ala	A	10	Tyr	Y
11	Stop	*	12	His	H
13	Gln	Q	14	Asn	N
15	Lys	K	16	Asp	D
17	Glu	E	18	Cys	C
19	Trp	W	20	Arg	R
21	Gly	G