Codebase list codonw / lintian-fixes/main README_coa.txt
lintian-fixes/main

Tree @lintian-fixes/main (Download .tar.gz)

README_coa.txt @lintian-fixes/mainraw · history · blame

README.coa

The permanent result files from a COA created by CodonW have the extension 
“.coa” for a description of their and contents see Table 1.

Short description of output files created by correspondence analysis in
CodonW.

summary.coa
This file contains a summary of all the information generated by 
correspondence analysis, including all the data written to files listed 
below, except for the output written to cusort.coa. 

eigen.coa
Each axis generated in the correspondence analysis is represented by a row 
of information. Each row consists of four columns, (1) the number of the 
axis, (2) the axis eigenvalue, (3) the relative inertia of the axis, (4) the 
sum of the relative inertia. 

amino.coa† or codon.coa
Each codon or amino acid included in the correspondence analysis is 
represented by a row. The first column is description of the variable, the 
subsequent columns contain the coordinate of the codon or amino acid on the 
axes, the number of axes is user definable.

genes.coa
Each row represents one gene, the first column contains a unique description 
for each gene, and subsequent columns contain the coordinates for each of 
the recorded axis. If additional genes are added to the correspondence 
analysis (advanced correspondence analysis option), the coordinates of these 
genes are appended to this file.

cusort.coa†
Contains the codon usage of each gene, sorted by the gene’s coordinate on 
the principal axis, this information is used to generate the table in 

hilo.coa
This files records a 2 way Chi squared contingency test between two subsets 
(as defined by the “advanced correspondence analysis options”) of genes 
positioned at the extremes of  axis 1 (cusort.coa). 

cai.coa†
Contains the relative usage of each codon within each synonym family, the 
most frequent codon assigned the value one and all other codons are 
expressed relative to this. This file can be used to calculate species 
specific CAI values. 

fop.coa †and cbi.coa†
Contains a list of the optimal codons and non-optimal codons as identified 
in the file “hilo.coa”. The format of this file can be utilised by CodonW to 
calculate Fop and CBI using a specific choice of optimal codons.

inertia.coa
This file is only generated if the exhaustive output option is selected 
under the advanced correspondence analysis menu. It contains four tables of 
information, the first two report the absolute contribution of each gene and 
codon (or amino acid) to the inertia explained by each axis. The second two 
tables’ report the fraction of variation in each gene and codon (or amino 
acid) explained by each axis. 

codon.coa and hilo.coaare not generated during the correspondence analysis
of amino acids


Detailed explanation of file contents


summary.coa
========================================
Correspondence analysis generate a large volume of data, CodonW writes the 
essential data necessary to interpret the correspondence analysis to the 
file “summary.coa”.

genes.coa codons.coa amino.coa
========================================
The most complex analysis that CodonW performs is correspondence analysis 
(COA). COA creates a series of orthogonal axis to identify trends that 
explain the data variation, with each subsequent axis explaining a 
decreasing amount of the variation. COA positions each gene and codon (or 
amino acid) on these axes. An important property is that the ordination of 
the rows (genes) and columns (codons or amino acids) are superimposable. 


eigen.coa
========================================
The Eigen values of the principle trends, as well as the more accessible 
fraction (with the cumulative total) of the total data inertia, that each 
axes is explaining, is recorded to summary.coa and eigen.coa. 


cusort.coa 
======================================== 
To simplify analyse of codon usage CodonW assumes that the principle trend 
is correlated with gene expression. It uses this assumption to identify 
putative optimal codons. Though the adage GIGO “garbage in, garbage out” 
must be stressed, it is the researchers responsibility to establish that the 
principle trend is correlated with gene expression (see tutorial for some 
example of how to do this).

To identify the putative optimal codons, the genes are sorted according to 
their position on the principle, the sorted codon usage of these genes is 
written to the file “cusort.coa”. Then a number of genes, decided by the 
advanced correspondence analysis menu option “number of genes used to 
identify optimal codons”, are read from the start and end of this file (i.e. 
equivalent the extremes of the principle axis), the codon usage of each set 
of genes is totalled. The set of genes with the lower Nc (more highly 
biased) is putatively 
identified as the more highly expressed.  

hilo.coa
======================================== 
Optimal codons are defined as those codons that occur significantly more 
often in highly expressed genes relative to their frequency in lowly 
expressed genes. Significance is assessed by a two-way chi square 
contingency test with the criterion of p < 0.01. The advantage of using a 
test of significance to identify optimal codons is that variation in codon 
usage between highly and lowly expressed genes, that is due to random noise 
is suppressed, but a disadvantage is that the test is dependent on sample 
size.  

After CodonW does a two way chi squared test on the genes taken from the 
extremes of axis 1,  their codon usage and RSCU is output as a table to 
“summary.coa” and “hilo.coa”. those codons which have been putatively 
identified as optimal p < 0.01 are indicated with an asterisk (*). Though 
not considered optimal by CodonW, codons that occur more frequently in the 
highly expressed dataset at 0.01 < p < 0.05 are indicated with a ampersand 
(@). 


fop.coa cbi.coa cai.coa
======================================== 
CodonW measures the degree to which the codon usage of a gene has adapted 
towards the usage of optimal codons. It does this by calculating these 
indices, the frequency of optimal codons (Fop), codon bias index, and codon 
adaptation index (CAI). To calculate these indexes, information about codon 
usage in the species being analysed is needed. The indices Fop and CBI used 
the optimal codons for the species. The index CAI uses codon adaptation 
values.
For some species this information is known, and for these the optimal codons 
and codon adaptiveness values are in-built into codonW (see the “Change 
Defaults” menu). For other species these indexes cannot be calculated unless 
the additional information is know. During calculation of these indices the 
user is prompted for input files.
During a COA CodonW generates the output files “cai.coa”, “fop.coa” and 
“cbi.coa”. These files can be used as input files for their respective 
indices (they are already in the correct format). 
Again it must be stressed that CodonW must make a number of assumptions to 
generate these files. These are: that the major trend in the codon usage is 
correlated with expression level; that the dataset contains highly expressed 
genes; that the genes used to identify of optimal codons where highly 
expressed. If these assumptions are valid then the files “cbi.coa”, 
“cai.coa” and “fop.coa” can be used to calculate the indexes CBI, CAI and 
Fop respectively.