Data Recoding
To add computation codonW converts sequence information
automatically from it original text format into a numerical format.
This is normally transparent to the user. To add additional genetic
codes or a personal choice of codon values for calculating the Fop,
CAI or CBI indices, some understanding of the schema used to convert
the sequences to numerical strings is advisable.
When calculating the indices Fop, CBI, or CAI which are measure of
codon bias in relation to the codon usage of a set of optimal genes,
there is an option of using a personal choice of these values. These
are read from file, there must be one value for each codon (64 in
total) and they must be found in the file in a set sequence (i.e.
the numerical order of the codons, TTT, TCT ... GAG, GGG). This is
also the order in which codon and amino acid results are recorded to
file.
Internally CodonW recodes all nucleotides, codons and amino acids.
Nucleotides are recoded as T/U=1, C=2, A=3, G=4. The 20 standard
amino acids and the termination codons are recoded as integer values
in the range 1 to 21, note that stop codons is assigned the amino
acid value 11 (see Table 2). The decision about whether a codon is
synonymous, or how many members are in a particular amino acid
synonymous family are taken at run time and are dependent on the
genetic code chosen.
Each codon is recoded into an integer value in the range 1 to 64,
see Table 1. The formulae used to recode the codons is:
Equation 1
code=((p1-1)*16)+P2+((p3-1)*4) 1<= code <= 64
Where each of the three codon positions is represented by P1, P2 and
P3. Using this recoding convention, the codon ATG has the value 45.
code=((3-1)*16)+1+((4-1)*4)=45
Unrecognised or non-translatable bases, codons or amino acids are
represented all assigned the value zero.
Table 1 Numerical values used for recoding codons
Code Codon AA Code Codon AA Code Codon AA Code Codon AA
1 UUU Phe 2 UCU Ser 3 UAU Tyr 4 UGU Cys
5 UUC 6 UCC 7 UAC 8 UGC
9 UUA Leu 10 UCA 11 UAA STOP 12 UGA STOP
13 UUG 14 UCG 15 UAG 16 UGG Trp
17 CUU 18 CCU Pro 19 CAU His 20 CGU Arg
21 CUC 22 CCC 23 CAC 24 CGC
25 CUA 26 CCA 27 CAA Gln 28 CGA
29 CUG 30 CCG 31 CAG 32 CGG
33 AUU Ile 34 ACU Thr 35 AAU Asn 36 AGU Ser
37 AUC 38 ACC 39 AAC 40 AGC
41 AUA 42 ACA 43 AAA Lys 44 AGA Arg
45 AUG Met 46 ACG 47 AAG 48 AGG
49 GUU Val 50 GCU Ala 51 GAU Asp 52 GGU Gly
53 GUC 54 GCC 55 GAC 56 GGC
57 GUA 58 GCA 59 GAA Glu 60 GGA
61 GUG 62 GCG 63 GAG 64 GGG
Table 2 Numerical values used to recode amino acids.
Code AA One letter code Code AA One letter code
1 Phe F 2 Leu L
3 Ile I 4 Met M
5 Val V 6 Ser S
7 Pro P 8 Thr T
9 Ala A 10 Tyr Y
11 Stop * 12 His H
13 Gln Q 14 Asn N
15 Lys K 16 Asp D
17 Glu E 18 Cys C
19 Trp W 20 Arg R
21 Gly G