Commit fb82b5f7a3c6a430565171c945c1684932d57584 - fasta3

+38

-17

README.md less more

0	0
1	1	## The FASTA package - protein and DNA sequence similarity searching and alignment programs
2	2
3		The FASTA (pronounced FAST-Aye, not FAST-Ah) programs are a
4		comprehensive set of similarity searching and alignment programs for
5		searching protein and DNA sequence databases. Like the BLAST programs `blastp` and `blastn`, the `fasta` program itself uses a rapid heuristic strategy for finding similar regions in protein and DNA sequences. But in
6		addition to heuristic similarity searching, the FASTA package provides
7		programs for rigorous local (`ssearch`) and global (`ggsearch`)
8		similarity searching, as well as a program for finding non-overlapping
9		sequence similarities (`lalign`). Like BLAST, the FASTA package also
10		includes programs for aligning translated DNA sequences against
11		proteins (`fastx`, `fasty` are equivalent to `blastx`, `tfastx`,
12		`tfasty` are similar to `tblastn`).
	3	The FASTA (pronounced FAST-Aye, not FAST-Ah) programs are a comprehensive set of similarity searching and alignment programs for searching protein and DNA sequence databases. Like the BLAST programs `blastp` and `blastn`, the `fasta` program itself uses a rapid heuristic strategy for finding similar regions in protein and DNA sequences. But in addition to heuristic similarity searching, the FASTA package provides
	4	programs for rigorous local (`ssearch`) and global (`ggsearch`) similarity searching, as well as a program for finding non-overlapping sequence similarities (`lalign`). Like BLAST, the FASTA package also includes programs for aligning translated DNA sequences against proteins (`fastx`, `fasty` are equivalent to `blastx`, and `tfastx`, `tfasty` are similar to `tblastn`).
13	5
14		####December, 2017
15		The current FASTA version is fasta-36.3.8f, Dec. 2017
	6	#### March, 2019
	7
	8	An updated release of the FASTA package (`fasta-36.3.8h`) is
	9	available. In addition to minor bug fixes, the latest version can
	10	generate query and library sequences using program scripts.
	11
	12	See doc/README_v36.3.8h.md and doc/readme.v36 for a more complete summary of changes.
	13
	14	#### December, 2018
	15
	16	The latest version of the FASTA package is `fasta-36.3.8h`, Dec. 2018.
	17
	18	See doc/README_v36.3.8h.md for a more complete summary of changes.
	19
	20	#### November, 2018
	21
	22	The current released version of the FASTA package is `fasta-36.3.8h`, Nov. 2018
	23
	24	See doc/README_v36.3.8h.md for a more complete summary of changes.
	25
	26	#### October, 2018
	27
	28	The current version of the FASTA package is fasta-36.3.8g, Oct. 2018
	29
	30	See doc/README_v36.3.8h.md for a more complete summary of changes.
	31
	32	#### April, 2018
	33	The current version of the FASTA package is fasta-36.3.8g, Apr. 2018
	34
	35	#### December, 2017
	36	The current FASTA version is fasta-36.3.8g, Dec. 2017
16	37
17	38	The statistics routines for normally distributed scores (ggsearch36,
18	39	glsearch36) are more robust to very low E()-value thresholds.
19	40
20		####Sept, 2017
	41	#### Sept, 2017
21	42	The current FASTA version is fasta-36.3.8f, Sept. 2017
22	43
23	44	If the -S option is used and a query sequence has no upper case
24	45	letters, it is re-read with lower-case letters converted to upper-case.
25	46
26		####May, 2017
	47	#### May, 2017
27	48	The current FASTA version is fasta-36.3.8f, May. 2017
28	49
29	50	Various bugs in sub-alignment scoring corrected and support for the
30		EBI SP:GSTM1_HUMAN P09488 added. The format for the $SRCH_URL and
31		$SRCH_URL2 format strings has changed to enable pairwise alignment.
	51	EBI SP:GSTM1_HUMAN P09488 added. The format for the `$SRCH_URL` and
	52	`$SRCH_URL2` format strings has changed to enable pairwise alignment.
32	53
33		####September, 2016
	54	#### September, 2016
34	55
35	56	The fasta-36.3.6e version includes a new directory, `psisearch2`, with
36	57	scripts to run iterative PSSM (PSI-BLAST or SSEARCH36) searches using

+0

-18

~~doc/README_v36.3.8g.md~~ less more

0
1
2		## The FASTA package - protein and DNA sequence similarity searching and alignment programs
3
4		Changes in fasta-36.3.8f released 31-Dec-2017
5
6		1. (December, 2017) -- Make statistical thresholds more robust for
7		small E()-values with normally distributed scores (ggsearch36,
8		glsearch36).
9
10		2. (September, 2017) Treat all lower-case queries as uppercase with -S option.
11
12		3. (May, 2017) Improvements/fixes to sub-alignment scoring strategies.
13
14		4. Improvements/fixes to psisearch2 scripts.
15
16		For more detailed information, see `doc/readme.v36`.
17

+87

-0

doc/README_v36.3.8h.md less more

	0
	1	## The FASTA package - protein and DNA sequence similarity searching and alignment programs
	2
	3	Changes in fasta-36.3.8h August, 2019
	4
	5	1. Modifications to support makeblastdb format v5 databases. Currently, only simple database reads have been tested.
	6
	7
	8	Changes in fasta-36.3.8h March, 2019
	9
	10	1. Translation table 1 (`-t 1`) now translates 'TGA'->'U' (selenocysteine).
	11
	12	2. New script for extracting DNA sequences from genomes (`scripts/get_genome_seq.py`). Currently works with human (hg38), mouse (mm10), and rat (rn6).
	13
	14	Changes in fasta-36.3.8h January, 2019
	15
	16	1. Bug fixes: `fastx`/`tfastx` searches done with the `-t t` option (which adds a `*` to protein sequences so that termination codons can be matched), did not work properly with the `VT` series of matrices, particularly `VT10`. This has been fixed.
	17
	18	2. New features: Both query and library/subject sequences can be generated by specifying a program script, either by putting a `!` at the start of the query/subject file name, or by specifying library type `9`. Thus, `fasta36 \\!../scripts/get_protein.py+P09488+P30711 /seqlib/swissprot.fa` or `fasta36 "../scripts/get_protein.py+P09488+P30711 9" /seqlib/swissprot.fa` will compare two query sequences, `P09488` and `P30711`, to SwissProt, by downloading them from Uniprot using the `get_protein.py` script (which can download sequences using either Uniprot or RefSeq protein accessions). Often, the leading `!` must be escaped from shell interpretation with `\\!`.
	19
	20	New scripts that return FASTA sequences using accessions or genome coordinates are available in `scripts/`. `get_protein.py`, `get_uniprot.py`, `get_up_prot_iso_sql.py` and `get_refseq.py`. `get_refseq.py` can download either protein or mRNA RefSeq entries. `get_up_prot_iso_sql.py` retrieves a protein and its isoforms from a MySQL database.
	21
	22	`get_genome_seq.py` extracts genome sequences using coordinates from local reference genomes (`hg38` and `mm10` included by default).
	23
	24	Changes in fasta-36.3.8h December, 2018
	25
	26	The `scripts/ann_exons_up_www.pl` and `ann_exons_up_sql.pl` now include the option `--gen_coord` which provides the associated genome coordinate (including chromosome) as a feature, indicated by `'<'` (start of exon) and `'>'` (end of exon).
	27
	28	Changes in fasta-36.3.8h released November, 2018
	29
	30	fasta-36.3.8h provides new scripts and modifications to the `fasta` programs that normalize the process of merging sub-alignment scores and region information into both FASTA and BLAST results. To move BLASTP towards FASTA with respect to alignment annotation and sub-alignment scoring:
	31
	32	1. The `blastp_annot_cmd.sh` runs a blast search, finds and scores domain information for the alignments, and merges this information back into the blast output `.html` file. This script uses:
	33
	34	1. `annot_blast_btab2.pl --query query.file --ann_script annot_script.pl --q_ann_script annot_script.pl blast.btab_file > blast.btab_file_ann` (a blast tabular file with one or two new fields, an annotation field and (optionally with --dom_info) a raw domain content field.
	35	2. `merge_blast_btab.pl --btab blast.btab_file_ann blast.html > blast_ann.html` (merge the annotations and domain content information in the `blast.btab_file_ann` file together with the standard blast output file to produce annotated alignments.
	36	3. In addition, `rename_exons.py` is available to rename exons (later other domains) in the subject sequences to match the exon labeling in the aligned query sequence.
	37	4. `relabel_domains.py` can be used to adjust color sets for homologous domains.
	38
	39	2. There is also an equivalent `fasta_annot_cmd.sh` script that provides similar funtionality for the FASTA programs. This script does not need to use `annot_blast_btab2.pl` to produce domain subalignment scores (that functionality is provided in FASTA), but it also can use `merge_fasta_btab.pl` and `rename_exons.py` to modify the names of the aligned exons/domains in the subject sequences.
	40
	41	3. To support the independence of the `blastp`/`fasta` output from html annotation, the FASTA package includes some new options:
	42
	43	1. The `-m 8CBL` option includes query sequence length and subject sequence length in the blast tabular output. In addition, if domain annotations are available, the raw domain coordinates are provided in an additional field after the annotation/subalignment scoring field. `-m 8CBl` provides the sequence lengths, but does not add the raw domain coordinates.
	44
	45	2. The `-Xa` option prevents annotation information from being included in the html output -- it is only available in the `-m 8CB` (or `-m 8CBL/l`) output
	46
	47	3. To reduce problems with spaces in script arguements, annotation scripts with spaces separating arguments can use '+' instead of ' '.
	48
	49	4. The `fasta_annot_cmd.sh` script produces both a conventional alignment on `stdout` and a `-m 8CBL` alignment, which is sent to a separate file, which is separated from the `-m F8CBL` option with a `=`, thus `-m F8CBL=tmp_output.blast_tab`.
	50
	51	Changes in fasta-36.3.8g released 23-Oct-2018
	52
	53	1. (Oct. 2018) Improvements to scripts in the `psisearch2/` directory:
	54
	55	1. `psisearch2/m89_btop_msa2.pl`
	56	1. the `--clustal` option produces a "CLUSTALW (1.8)", which is required for some downstream programs
	57	2. the `--trunc_acc` option removes the database and accession from identifiers of the form: `sp\|P09488\|GSTM1_HUMAN` to produce `GSTM1_HUMAN`.
	58	3. the `--min_align` option specifies the fraction of the query sequence that must be aligned `(q_end-q_start+1)/q_length)`
	59	Together, these changes make it possible for the output of `m89_btop_msa2.pl` to be used by the EMBOSS program `fprotdist`.
	60
	61	2. A more general implementation of `psisearch2_msa_iter.sh`, which does `psisearch2` one iteration at a time, and a new equivalent `psisearch2_msa_iter_bl.sh`, which uses `psiblast` to do the search.
	62
	63	* (Oct. 2018) A small restructuring of the `make/Makefiles` to remove the `-lz` dependence for non-debugging scripts (and add it back when -DDEBUG is used).
	64
	65	Changes in fasta-36.3.8g released 5-Aug-2018
	66
	67	1. (Apr 2018) incorporation of `-t t1` termination codes ("") in `-m 8CB`, `-m 8CC`, and `-m9C` so that aligned termination codons are indicated as `` (`-m8CB`) or `1` (`-m8CC`, `-m9C`).
	68
	69	2. (Mar 2018) Updates to scripts/annot_blast_btop2.pl to provide subalignment scoring for blastp searches (BLOSUM62 only). (see doc/readme.v36)
	70
	71	3. (Feb. 2018) a new extended option, `-XB`, which causes percent identity, percent similarity, and alignment length to be calculated using the BLAST model, which does not count gaps in the alignment length.
	72
	73	see readme.v36 for other bug fixes.
	74
	75	Changes in fasta-36.3.8g released 31-Dec-2017
	76
	77	1. (December, 2017) -- Make statistical thresholds more robust for small E()-values with normally distributed scores (`ggsearch36`,`glsearch36`).
	78
	79	2. (September, 2017) Treat lower-case queries with no upper-case residues as uppercase with `-S` option.
	80
	81	3. (May, 2017) Improvements/fixes to sub-alignment scoring strategies.
	82
	83	4. Improvements/fixes to psisearch2 scripts.
	84
	85	For more detailed information, see `doc/readme.v36`.
	86

+34

-18

doc/changes_v36.html less more

23	23	</small>
24	24	</pre>
25	25	<hr>
26		<h2>Latest Updates - FASTA version 36.3.8d (April, 2016)</h2>
27		<ol>
28		<li>
29		The <tt>fasta-36.3.8d/scripts/</tt> directory now provides a
30		script, <tt>annot_blast_btop2.pl</tt> that allows annotations and
31		sub-alignment scoring on BLAST alignments that use the tabular format
32		with BTOP alignment encoding.
33		<p>
34		<li>
35		Bug fixes for overlapping domain domain scoring. v36.3.7 was not thread-safe.
36		<li>
37		Annotation scripts accessing the Pfam domain database can now use
38		the <tt>--vdoms</tt> option to highlight missing parts of a Pfam
39		domain model. In addtion, domains from clans are labeled as clans
40		unless <tt>--no-clans</tt> is specified.
41		</ol>
42		<h2>Updates - FASTA version 36.3.7 (November, 2014)</h2>
	26	<h2>Latest Updates - FASTA version 36.3.8h (March, 2019)</h2>
43	27	<ol>
44	28	<li>The FASTA programs have been released under the Apache2.0 Open
45	29	Source License. The COPYRIGHT file, and copyright notices in
46	30	program files, have been updated to reflect this change.
47	31	<p>
	32	<li>
	33	fasta-36.3.8h includes bug fixes for translated alignments
	34	with termination codons, the ability to use scripts as query
	35	and library sequences, and new scripts for extracting genomic
	36	DNA sequences given chromosome coordinates.
	37	<li>
	38	fasta-36.3.8g includes bug fixes for sub-alignment scoring and
	39	psisearch2 scripts, new annotation scripts for exons, and
	40	fixes enabling very low statistical thresholds with ggsearch36
	41	and glsearch36.
	42	<li>
	43	fasta-36.3.8e/scripts includes updated scripts for
	44	capturing domain and feature annotations using the
	45	EBI/proteins API (https://www.ebi.ac.uk/proteins/api/) to get
	46	Uniprot annotations and exon locations.
	47	<p>
	48	<li>
	49	The <tt>fasta-36.3.8e/psisearch2/</tt> directory now
	50	provides <tt>psisearch2_msa.pl</tt>
	51	and <tt>psisearch2_msa.py</tt>, functionally identical scripts
	52	for iterative searching with <tt>psiblast</tt>
	53	or <tt>ssearch36</tt>. <tt>psisearch2-msa.pl</tt> offers an
	54	option, <tt>--query_seed</tt>, that can dramatically reduce
	55	false-positives caused by alignment overextension, with very
	56	little loss of search sensitivity.
	57	<p>
	58	<li>
	59	The <tt>fasta-36.3.8d/scripts/</tt> directory now provides a
	60	script, <tt>annot_blast_btop2.pl</tt> that allows annotations and
	61	sub-alignment scoring on BLAST alignments that use the tabular format
	62	with BTOP alignment encoding.
	63	<p>
48	64	<li>Alignment sub-scoring scripts have been extended to allow
49	65	overlapping domains. This requires a modified annotation file format.
50	66	The "classic" format placed the beginning and end of a domain on different lines:

69	85	</pre>
70	86	<p>
71	87	<li> New annotation scripts are available in
72		the <tt>fasta-36.3.7/scripts</tt> directory,
	88	the <tt>fasta-36.3.8/scripts</tt> directory,
73	89	e.g. <tt>ann_pfam_www_e.pl</tt> (Pfam) and <tt>ann_up_www2_e.pl</tt>
74	90	(Uniprot) to support this new format. If the domain annotations
75	91	provided by Pfam or Uniprot overlap, then overlapping domains are

doc/fasta_guide.pdf less more

Binary diff not shown

+62

-27

doc/fasta_guide.tex less more

266	266	with a '$>$' character, followed by the sequence itself:
267	267	\begin{quote}
268	268	\begin{verbatim}
269		>sequence name and description 1
	269	>sequence_name1 and description
270	270	A F A S Y T .... actual sequence.
271	271	F S S .... second line of sequence.
272		>sequence name and description 2
	272	>sequence_name2 and description
273	273	PMILTYV ... sequence 2
274	274	\end{verbatim}
275	275	\end{quote}
276	276	All of the characters of the description line are read, and special
277	277	characters can be used to indicate additional information about the
278		sequence. In general, non-amino-acid/non-nucleotide sequences in the
279		sequence lines are ignored.
	278	sequence. In particular, a \texttt{'@:C 12345'} at the end of the
	279	description line indicates that the first residue of the sequence has
	280	coordinate \texttt{'12345'}, instead of starting at \texttt{'1'}.
	281	Coordinates can be negative; a DNA sequence upstream from the start of
	282	transcription could be displayed with negative coordinates.
	283
	284	In general, non-amino-acid/non-nucleotide sequences in the sequence
	285	lines are ignored, with the exception of \texttt{'*'}, which indicates
	286	a termination codon in a protein sequence, and can be used to indicate
	287	the match to a termination codon in protein:DNA alignments.
280	288
281	289	FASTA format files from major sequence distributors, like the NCBI and
282	290	EBI, have specially formatted description lines, e.g.:\\
283	291	\indent
284	292	\texttt{
285		>gi\|54321\|ref\|np\_12345\| example NCBI refseq sequence\\
	293	>np\_12345\| example NCBI refseq sequence\\
286	294	}
287	295	or\\
288	296	\indent
289	297	\texttt{
290		>sw:gstm1\_human P01234 glutathione transferase GSTM1 - human\\
	298	>sp:gstm1\_human P01234 glutathione transferase GSTM1 - human\\
	299	}
	300	or
	301	\indent
	302	\texttt{
	303	>sp\|P09488\|GSTM1\_HUMAN glutathione transferase GSTM1 - human\\
291	304	}
292	305
293	306	Several sample test files are included with the FASTA distribution:

851	864	comments, \texttt{-m 8XC} without comments) and, if available, an
852	865	annotation encoding matching FASTA \texttt{-m 9C} output. All the
853	866	\texttt{-m 9c/C/d/D} encodings are available with BLAST tabular
854		output using \texttt{-m 8C[c/C/d/D]}.
	867	output using \texttt{-m 8C[c/C/d/D]}. In the v36.3.8h release, a
	868	new option has been added to \texttt{-m 8CB}, \texttt{-m 8CBL} (or
	869	\texttt{-m 8CBl}. The \texttt{L/l} option adds the lengths of the
	870	query and subject sequences after the \texttt{seqid}'s to BLAST
	871	tabular output, e.g. \texttt{qseqid qlen sseqid slen percid ...}
855	872
856	873	\item[\texttt{-m 9}] display alignment coordinates and scores with the
857	874	best score information. \texttt{-m 9i} provides alignment length,

925	942	\texttt{1M1X2M4X2M1X2M7X3M9D1M2X1M4X2M1X1M1X2I1X1M1X1M3X1M2X1I3M1D1X1M2X1M}
926	943	\end{footnotesize}
927	944	\item[\texttt{-m 10}]
928		a parseable format for use with other programs.
	945	a parseable format for use with other programs (this option no longer reliably tested; \texttt{-m 8CBL} is easier to parse and tested more extensively).
929	946	\item[\texttt{-m 11}]
930	947	Provide \texttt{lav}-like output (used by \texttt{lalign}) for graphical output.
931	948	\begin{quote}

1123	1140	programs. (There is an option in the \texttt{Makefile},
1124	1141	\texttt{-DDNALIB\_LC}, to enable preserving case in DNA sequences.)
1125	1142
1126		\item[\texttt{-t \#}]
1127		Translation table - fastx36, tfastx36, fasty36, and
1128		tfasty3 now support the BLAST translation tables. See
1129		\url{http://www.ncbi.nih.gov/Taxonomy/Utils/wprintgc.cgi}.
1130
1131		\texttt{-t t} or \texttt{-t t\#} enables the addition of
1132		an implicit termination codon to a protein:translated DNA match. That
1133		is, each protein sequence implicitly ends with \texttt{*}, which
1134		matches the termination codes for the appropriate genetic code.
1135		\texttt{-t t\#} sets implicit termination and a different genetic
1136		code.
	1143	\item[\texttt{-t \#}] Translation table - fastx36, tfastx36, fasty36,
	1144	and tfasty3 now support the BLAST translation tables. See
	1145	\url{http://www.ncbi.nih.gov/Taxonomy/Utils/wprintgc.cgi}.
	1146
	1147	\texttt{-t 1} also enables translation of \texttt{'TGA'} to
	1148	\texttt{'U'} (seleno-cysteine) (by default, \texttt{'TGA'} is
	1149	translated to \texttt{'*'}). Because of the ambiguity of the
	1150	\texttt{'TGA'} codon, translated alignments of \texttt{'TGA'} with
	1151	\texttt{-t 1} match \texttt{'U'} and \texttt{'*'} (termination)
	1152	equally well.
	1153
	1154	\texttt{-t t} enables the addition of an implicit termination codon to
	1155	a protein:translated DNA match. That is, each protein sequence
	1156	implicitly ends with \texttt{*}, which matches the termination codes
	1157	for the appropriate genetic code. To change the translation table and
	1158	insert a termination character after each protein sequence, use
	1159	\texttt{-t 1 -t t}.
	1160
1137	1161	\item[\texttt{-T \#}]
1138	1162	set number of threads/workers. Normally on a multi-core machine, the maximum
1139	1163	number of processors/cores is used.

1348	1372	\item[\texttt{X1}] sort output by \texttt{init1} score (for
1349	1373	compatibility with FASTP; obsolete).
1350	1374
1351		\item[\texttt{XB}] (Previously \texttt{-B}.) Show the z-score, rather
	1375	\item[\texttt{XB}] Calculate pecent identity, percent similarity, and
	1376	alignment using the BLAST model, which excludes gapped residues.
	1377	This allows very high identity alignments with large gaps to look
	1378	much closer, but causes the alignment length to drop by the length
	1379	of the gap.
	1380
	1381	\item[\texttt{Xb}] (Previously \texttt{-B}.) Show the z-score, rather
1352	1382	than the bit-score in the list of best scores (rarely used, provided
1353	1383	for backward compatibility).
1354	1384

1794	1824	5 & NBRF/PIR VMS (\texttt{>P1;SEQID}/comment/sequence) (obsolete)\\
1795	1825	6 & GCG (version 8.0) Unix Protein and DNA (compressed)\\
1796	1826	7 & FASTQ (sequence only, quality ignored)\\
	1827	9 & a script that is executed to produce FASTA format sequences \\
1797	1828	10 & subset format (</slib2/swissprot.lseg 0:2 4\|) \\
1798	1829	11 & NCBI Blast1.3.2 format (unix only) (obsolete)\\
1799	1830	12 & NCBI Blast2.0 format\\

1869	1900	\section{Frequently Asked Questions (FAQs)}
1870	1901
1871	1902	{\noindent}\textbf{Where can I get FASTA?} --
1872		\url{http://faculty.virginia.edu/wrpearson/fasta} has the latest
1873		versions of the FASTA programs. This document describes
1874		\texttt{\CURRENT}, which is available from
1875		\url{http://faculty.virginia.edu/wrpearson/fasta/fasta3.tar.gz}.
1876		In addition, pre-compiled versions of the programs are available for
	1903
	1904	The most current version of the FASTA source code is available from
	1905	\url{http://github.com/wrpearson/fasta36}. In addition, you can get
	1906	the programs from \url{http://faculty.virginia.edu/wrpearson/fasta},
	1907	but sometimes there is a lag between the latest release on GITHUB and
	1908	the compiled versions at \url{faculty.virginia.edu}. This document
	1909	describes \texttt{\CURRENT}, which is available from
	1910	\url{http://faculty.virginia.edu/wrpearson/fasta/fasta3.tar.gz}. In
	1911	addition, pre-compiled versions of the programs are available for
1877	1912	MacOSX and Windows.
1878	1913
1879	1914	\needspace{4\baselineskip}

1886	1921	Prot. & Prot. & \texttt{fasta36} & \texttt{blastp} & heuristic local similarity \\
1887	1922	& & \texttt{ssearch36} & & optimal local sim.\\
1888	1923	& & \texttt{ggearch36} & & global:global sim. \\
1889		& & \texttt{ggearch36} & & global:local sim.\\
	1924	& & \texttt{glearch36} & & global:local sim.\\
1890	1925	DNA & DNA & \texttt{fasta36}$^*$ & \texttt{blastn} & \\[1.2ex]
1891	1926	\hline \\[-1.0ex]
1892	1927	Prot. & Prot. & \texttt{lalign36} & & multiple non-intersecting \\

2028	2063	\begin{quote}
2029	2064	William R. Pearson\\
2030	2065	Department of Biochemistry\\
2031		Jordan Hall Box 800733\\
	2066	Pinn Hall Box 800733\\
2032	2067	U. of Virginia\\
2033	2068	Charlottesville, VA\\
2034	2069	wrp@virginia.EDU

+1

-0

doc/readme.md less more

0

README_v36.3.8h.md⏎

+1

-1

doc/readme.v34t0 less more

110	110
111	111	This release provides an extremely efficient SSE2 implementation of
112	112	the Smith-Waterman algorithm for the SSE2 vector instructions written
113		by Michael Farrar (farrar.michael@gmail.com). The SSE code speeds up
	113	by Michael Farrar. The SSE code speeds up
114	114	Smith-Waterman 8 - 10-fold in my tests, making it comparable to Eric
115	115	Lindahl's Altivec code for the Apple/IBM G4/G5 architecture.
116	116

+199

-0

doc/readme.v36 less more

4	4	multiple high-scoring alignments to be shown, rather than just one.
5	5	This is the main functional difference between FASTA and BLAST -
6	6	BLAST could show multiple HSPs, FASTA did not.
	7
	8	>>Aug. 9, 2019
	9	[src/ncbl2_mlib.c, ncbl2_head.h]
	10
	11	Modest extensions made to support reading makeblastdb format v5
	12	databases. Changes have only been made to read the db.pin file, but
	13	things work in simple tests.
	14
	15	>July 16, 2019
	16	[src/comp_lib9.c]
	17
	18	Fixed a memory leak problem when searching with large libraries that
	19	could be memory mapped (libraries with .xin index files). If the
	20	library did not fit in memory, then the kept allocating new memory.
	21	By default, the largest database that fits in memory must be less than
	22	16 GB. Larger libraries will be re-read, which slows down multi-query
	23	searches considerably. To increase the size of the library allowed in
	24	memory, use the option: "-X M32G" to fit 32 GB libraries.
	25
	26	>>Mar. 8, 2019
	27	[src/initfa.c,faatran.c,dropfx2.c]
	28	Modify translation table 1 to allow selenocysteine translation
	29	(TGA->'U'), and modify scoring matrices to give positive scores to
	30	'*':'U'. The translation modification ONLY works with "-t 1". In
	31	addition, BLAST BTOP alignments (-m 8CB) convert a 'U' aligned with a
	32	'' to a '', so the end of the alignment is '*' rather than 'U'
	33	(fastx36) or '*U' (tfastx36).
	34
	35	dropfx2.c (fastx36/tfastx36), dropfz3.c(fasty36/tfasty36) did not
	36	properly switch protein and translated DNA codes with -m 8CB -- fixed.
	37
	38	version date updated to Mar, 2019
	39
	40	>>Feb. 26, 2019
	41	[scripts/get_genome_seq.py]
	42	added get_genome_seq.py as a replacement for get_hg38_bed.py, remove
	43	get_hg38_bed.py. 'get_genome_seq.py --genome mm10' also produces
	44	sequences from mouse mm10 (and can now do any genome that bedtools can
	45	read).
	46
	47	>>Feb. 23, 2019
	48	[src/comp_lib9.c, mshowbest.c]
	49	Modify repeat_thresh so that poor alignment scores (E() >
	50	ppst->e_cut_r, typically -E-threshold/10.0) do not look for additional
	51	alignments.
	52
	53	>>Feb. 21, 2019
	54	[src/nmgetaa.c, scaleswn.c, scripts/get_protein.py, get_hg38_bed.py]
	55
	56	Modify nmgetaa.c to ignore ':'s (for sequence subsets) in scripts.
	57	The script can do the subsetting. Modify scripts/get_protein.py to
	58	provide subsetting. Add scripts/get_hg38_bed.py to extract fasta
	59	sequences using the format "chr2:123456-543210"
	60
	61	Modify scaleswn.c to estimate Altshul-Gish parameters when gap and
	62	extension do not match exactly.
	63
	64	>>Feb. 6, 2019
	65	[src/compacc2e.c, nmgetaa.c]
	66	modify build_link_data() to allow '+' for space in scripts. Ensure
	67	that lib_type is properly initialized (open_lib.c()).
	68
	69	>>Jan. 23, 2019
	70	[nmgetaa.c]
	71	Fix bug introduced when checking for lib_type.
	72
	73	>>Jan. 15, 2019
	74	[src/upam.h, altlib.h, nmgetaa.c]
	75	[scripts/rename_exons.py, map_exons_coords.py, get_uniprot.py, get_refseq.py, get_proteins.py]
	76
	77	Bug fixes: The VT10, VT20, etc scoring matrices did not have scores for ':'
	78	alignments, used with FASTX/TFASTX for extending alignments through
	79	the termination codon. As a result, searchs with '-t t' did not
	80	extend through the termination codon, even though they should have.
	81	This has been fixed.
	82
	83	Enhancements: FASTA can now download both query and library sequences using a script, by specifying file type 9. Thus:
	84
	85	fasta36 "../scripts/get_uniprot.py+P09488 9" /seqlib/swissprot.fasta
	86
	87	Will run the script "get_uniprot.py" with the argument "P09488" and
	88	use the output of the script as the query sequence. In this example,
	89	the library type (9) is specified by the " 9" (this space cannot be
	90	replaced with a '+' character).
	91
	92	Alternatively, library type '9' can be specified by putting a '!' before the script file name.
	93
	94	fasta36 \!../scripts/get_uniprot.py+P09488 /seqlib/swissprot.fasta
	95
	96	Scripts can be used to produce query or library sequences, or both.
	97	Three scripts that download sequences from the NCBI and Uniprot have
	98	been added in the "scripts" directory: "get_uniprot.py" takes Uniprot
	99	accessions as arguments, "get_refseq.py" takes refseq accessions
	100	(protein or mRNA), and "get_protein.py" gets both Uniprot and RefSeq
	101	protein sequences.
	102
	103	rename_exons.py and map_exons_coords.py can take annotated BTOP
	104	alignments with genome coordinates and map exons to the alternative
	105	genome.
	106
	107	>>Jan. 2, 2019
	108	[src/mshowbest.c]
	109	Fix problems with site annotation when dom_info is provided with -m8CBL
	110	[scripts/ann_exons_up_sql.pl, ann_exons_up_www.pl]
	111	Make scripts more robust to missing chromosome information,
	112	reverse-strand coordinates.
	113
	114	>>Dec. 11, 2018
	115	[scripts/ann_exons_up_www.pl, ann_exons_up_sql.pl]
	116	Add the option "--gen_coord" to report exon start ('<') and end ('>')
	117	genome coordinates features of exons.
	118
	119	>>Nov. 14, 2018
	120	[scripts/rename_exons.py, relabel_domains.py, compacc2e.c]
	121
	122	Two new scripts, rename_exons.py and relabel_domains.py, that take a
	123	blast tabular output file with domain alignment annotations (and
	124	possibly raw domain information) and modifies the names
	125	(rename_exons.py) or colors (relabel_domains.py). rename_exons.py
	126	takes the exon numbering associated with the query sequence and maps
	127	it onto the subject alignments. relabel_domains.py can be used to use
	128	different color numbers for homologous and non-homologous domains.
	129
	130	Both of these programs modify blast tabular output files, which can
	131	then be merged back into an alignment display using
	132	merge_blastp_annot.pl or merge_fasta_annot.pl.
	133
	134	compacc2.c:build_link_data() has been modified to convert '+' in the
	135	script string to ' ', to allow passing command line options. A space
	136	in the script string is used to separate the script from the library
	137	type of the file returned by the script.
	138
	139	>>Nov. 6-7, 2018
	140	[doinit.c, mshowbest.c, mshowalign2.c, defs.h, structs.h]
	141
	142	(a) Add options to provide query and subject sequence lengths and raw
	143	domain coordinates in BLASTP tabular output with the options -m 8CBl
	144	and -m 8CBL. If domain annotations are available, -m 8CBL also
	145	provides the raw domain coordinates (not just those included in the
	146	alignment) in the form \|DX:1-100;C=PF12345\|XD:1-100;C=PF12345 where
	147	\|DX a query annotation and \|XD indicates a subject annotation. -m
	148	8CBl (lower-case L) shows the sequence lengths, but not the raw domain
	149	info.
	150
	151	(b) parse the annotation program strings so that '+' are converted to
	152	' '. This greatly simplifies passing arguments to the annotation scripts. Thus:
	153
	154	-V \!ann_pfam_sql.pl --db=pfam31 --neg --vdoms can be written as:
	155	-V \!ann_pfam_sql.pl+--db=pfam31+--neg+--vdoms (likewise for -V q\!ann_pfam...)
	156
	157	(c) provide an option to remove region/feature annotations from non-m8
	158	(blast-tabular) output. This simplifies the process of using
	159	scripts/merge_fasta_btab.pl to use .bl_tab (-m 8CBL) files to inject
	160	sub-alignment scores and domain information.
	161
	162	>>Nov. 1, 2018
	163	[doinit.c]
	164	Allow -m F#=file.name in addition to -m "F# file.name" to address
	165	problems I had with spaces in shell scripts.
	166
	167	>>Oct. 23, 2018 [re-released as fasta-36.3.8g] (see README_v36.3.8g.md)
	168	[make/Makefiles*,psisearch2/m89_btop_msa2.pl]
	169
	170	Add options to psisearch2/m89_btop_msa2.pl to provide clustalw header
	171	(--clustal), require a minimum coverage of the query sequence
	172	(--min_align 0.8), and edit sequence identifiers to remove database
	173	and accession (--trunc_acc).
	174
	175	Remove -lz dependency from non-debug Makefiles.
	176
	177	>>Aug. 5, 2018 [re-released as fasta-36.3.8g]
	178	[lib_sel.c]
	179	Make lib_select.c more robust to missing indirect name files.
	180	[scripts/ann*.pl]
	181	update various annotation scripts to use https:// instead of http://
	182
	183	>>April 3, 2018
	184	[initfa.c, comp_lib.c, dropfx2.c]
	185	Changes to (a) ensure that the "-t t" option correctly inserts and
	186	aligns a termination codon '*'. (a) changes to -m 8CB, -m8CC, and -m9C
	187	so that aligned termination codons are indicated as "**" (-m8CB) or
	188	"*1" (-m8CC, -m9C).
	189
	190	>>Mar. 9, 2018
	191	[scripts/annot_blast_btop2.pl, merge_blast_btab.pl, blastp_annot_cmd.sh]
	192	Code is now in place to provide sub-alignment scoring using domain
	193	annotations with blastp searches (BLOSUM62 only). blastp_annot_cmd.sh
	194	runs blast and produces both a standard HTML and a tabular output
	195	file. It then runs annot_blast_btop2.pl to add sub-alignment scoring
	196	to the tabular ouput file, and then merge_blast_btab.pl merges the
	197	domain-annotated blast tabular file with the HTML output file. When
	198	combined in this way, the FASTA web server (fasta.bioch.virginia.edu)
	199	can produce blastp searches with domain highlights/scoring.
	200
	201	>>Feb. 6, 2018
	202	[initfa.c, doinit.c, mshowbest.c, mshowalign2.c]
	203	Add a new extended option, -XB, which causes percent identity, percent
	204	similarity, and alignment length to be presented using the BLAST
	205	model, which does not count gaps in the alignment length.
7	206
8	207	>>Dec. 30, 2017 [released as fasta-36.3.8g]
9	208	[scaleswn.c]

+0

-1

~~make/Makefile.linux~~ less more

0

Makefile.linux64_sse2⏎

+67

-0

make/Makefile.linux less more

	0	# $ Id: $
	1	#
	2	# makefile for fasta3, fasta3_t Use Makefile.mpi for fasta36_mpi
	3	#
	4	# This file is designed for 64-bit Linux systems using an X86
	5	# architecture with SSE2 extensions. -D_LARGEFILE64_SOURCE and
	6	# -DBIG_LIB64 require a 64-bit linux system.
	7	# SSE2 extensions are used for ssearch35(_t)
	8	#
	9	# Use Makefile.linux32_sse2 for 32-bit linux x86
	10	#
	11
	12	SHELL=/bin/bash
	13
	14	CC = gcc -g -O -msse2
	15	LIB_DB=
	16
	17	#CC= gcc -pg -g -O -msse2 -ffast-math
	18	#CC = gcc -g -DDEBUG -msse2
	19	#CC=gcc -Wall -pedantic -ansi -g -msse2 -DDEBUG
	20
	21	# EBI uses the following with pgcc, -O3 does not work:
	22	# CC= pgcc -O2 -pipe -mcpu=pentiumpro -march=pentiumpro -fomit-frame-pointer
	23
	24	# this file works for x86 LINUX
	25
	26	# standard options
	27
	28	CFLAGS= -DSHOW_HELP -DSHOWSIM -DUNIX -DTIMES -DHZ=100 -DMAX_WORKERS=8 -DTHR_EXIT=pthread_exit -DM10_CONS -D_REENTRANT -DHAS_INTTYPES -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DUSE_FSEEKO -DSAMP_STATS -DPGM_DOC -DUSE_MMAP -D_LARGEFILE64_SOURCE -DBIG_LIB64
	29	# -I/usr/include/mysql -DMYSQL_DB
	30	# -DSUPERFAMNUM -DSFCHAR="'\|'"
	31
	32	#
	33	#(for mySQL databases) (also requires change to Makefile36m.common or use of Makefile36m.common_mysql)
	34	# run 'mysql_config' so find locations of mySQL files
	35
	36	LIB_M = -lm
	37	# for mySQL databases
	38	# LIB_M = -L/usr/lib64/mysql -lmysqlclient -lm
	39
	40	HFLAGS= -o
	41	NFLAGS= -o
	42
	43	# for Linux
	44	THR_SUBS = pthr_subs2
	45	THR_LIBS = -lpthread
	46	THR_CC =
	47
	48	BIN = ../bin
	49	XDIR = /seqprg/bin
	50	#XDIR = ~/bin/LINUX
	51
	52	# set up files for SSE2/Altivec acceleration
	53	#
	54	include ../make/Makefile.sse_alt
	55
	56	# SSE2 acceleration
	57	#
	58	DROPGSW_O = $(DROPGSW_SSE_O)
	59	DROPLAL_O = $(DROPLAL_SSE_O)
	60	DROPGNW_O = $(DROPGNW_SSE_O)
	61	DROPLNW_O = $(DROPLNW_SSE_O)
	62
	63	# renamed (fasta36) programs
	64	include ../make/Makefile36m.common
	65	# conventional (fasta3) names
	66	# include ../make/Makefile.common

+2

-0

make/Makefile.linux32 less more

12	12
13	13	#CC= gcc -g -O
14	14	#CC = gcc -g -DDEBUG
	15	#LIB_DB=
15	16
16	17	#CC=gcc -Wall -pedantic -ansi -g -O
17	18	CC= /usr/local/parasoft/bin/insure -g -DDEBUG
	19	LIB_DB=-lz
18	20
19	21	# EBI uses the following with pgcc, -O3 does not work:
20	22	# CC= pgcc -O2 -pipe -mcpu=pentiumpro -march=pentiumpro -fomit-frame-pointer

+2

-0

make/Makefile.linux32_sse2 less more

12	12	SHELL=/bin/bash
13	13
14	14	CC= gcc -g -O -msse2 -ffast-math
	15	LIB_DB=
15	16	#CC = gcc -g -DDEBUG -msse2
16	17
17	18	#CC= /usr/local/parasoft/bin/insure -g -DDEBUG
	19	#LIB_DB=-lz
18	20
19	21	#CC=gcc -Wall -pedantic -ansi -g -O
20	22

+0

-1

~~make/Makefile.linux64~~ less more

0

Makefile.linux64_sse2⏎

+67

-0

make/Makefile.linux64 less more

	0	# $ Id: $
	1	#
	2	# makefile for fasta3, fasta3_t Use Makefile.mpi for fasta36_mpi
	3	#
	4	# This file is designed for 64-bit Linux systems using an X86
	5	# architecture with SSE2 extensions. -D_LARGEFILE64_SOURCE and
	6	# -DBIG_LIB64 require a 64-bit linux system.
	7	# SSE2 extensions are used for ssearch35(_t)
	8	#
	9	# Use Makefile.linux32_sse2 for 32-bit linux x86
	10	#
	11
	12	SHELL=/bin/bash
	13
	14	CC = gcc -g -O -msse2
	15	LIB_DB=
	16
	17	#CC= gcc -pg -g -O -msse2 -ffast-math
	18	#CC = gcc -g -DDEBUG -msse2
	19	#CC=gcc -Wall -pedantic -ansi -g -msse2 -DDEBUG
	20
	21	# EBI uses the following with pgcc, -O3 does not work:
	22	# CC= pgcc -O2 -pipe -mcpu=pentiumpro -march=pentiumpro -fomit-frame-pointer
	23
	24	# this file works for x86 LINUX
	25
	26	# standard options
	27
	28	CFLAGS= -DSHOW_HELP -DSHOWSIM -DUNIX -DTIMES -DHZ=100 -DMAX_WORKERS=8 -DTHR_EXIT=pthread_exit -DM10_CONS -D_REENTRANT -DHAS_INTTYPES -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DUSE_FSEEKO -DSAMP_STATS -DPGM_DOC -DUSE_MMAP -D_LARGEFILE64_SOURCE -DBIG_LIB64
	29	# -I/usr/include/mysql -DMYSQL_DB
	30	# -DSUPERFAMNUM -DSFCHAR="'\|'"
	31
	32	#
	33	#(for mySQL databases) (also requires change to Makefile36m.common or use of Makefile36m.common_mysql)
	34	# run 'mysql_config' so find locations of mySQL files
	35
	36	LIB_M = -lm
	37	# for mySQL databases
	38	# LIB_M = -L/usr/lib64/mysql -lmysqlclient -lm
	39
	40	HFLAGS= -o
	41	NFLAGS= -o
	42
	43	# for Linux
	44	THR_SUBS = pthr_subs2
	45	THR_LIBS = -lpthread
	46	THR_CC =
	47
	48	BIN = ../bin
	49	XDIR = /seqprg/bin
	50	#XDIR = ~/bin/LINUX
	51
	52	# set up files for SSE2/Altivec acceleration
	53	#
	54	include ../make/Makefile.sse_alt
	55
	56	# SSE2 acceleration
	57	#
	58	DROPGSW_O = $(DROPGSW_SSE_O)
	59	DROPLAL_O = $(DROPLAL_SSE_O)
	60	DROPGNW_O = $(DROPGNW_SSE_O)
	61	DROPLNW_O = $(DROPLNW_SSE_O)
	62
	63	# renamed (fasta36) programs
	64	include ../make/Makefile36m.common
	65	# conventional (fasta3) names
	66	# include ../make/Makefile.common

+2

-0

make/Makefile.linux64_sse2 less more

12	12	SHELL=/bin/bash
13	13
14	14	CC = gcc -g -O -msse2
	15	LIB_DB=
	16
15	17	#CC= gcc -pg -g -O -msse2 -ffast-math
16	18	#CC = gcc -g -DDEBUG -msse2
17	19	#CC=gcc -Wall -pedantic -ansi -g -msse2 -DDEBUG

+2

-0

make/Makefile.linux_icc less more

7	7	SHELL=/bin/bash
8	8
9	9	CC= icc -g -O3
	10	LIB_DB=
10	11	#CC = icc -g -DDEBUG
	12	#LIB_DB=-lz
11	13
12	14	#CC=gcc -Wall -pedantic -ansi -g -O
13	15	#CC= /usr/local/parasoft/bin/insure -g -DDEBUG

+3

-1

make/Makefile.linux_icc_sse2 less more

8	8
9	9	SHELL=/bin/bash
10	10
11		CC= icc -O3 -g
	11	CC= icc -O3 -g -pthread
	12	LIB_DB=
12	13	#CC = icc -g -DDEBUG
	14	#LIB_DB=-lz
13	15
14	16	#CC=gcc -Wall -pedantic -ansi -g -O
15	17	#CC= /usr/local/parasoft/bin/insure -g -DDEBUG

+2

-0

make/Makefile.linux_mysql less more

10	10	SHELL=/bin/bash
11	11
12	12	CC= gcc -g -O2
	13	LIB_DB=
13	14	#CC= gcc -g -DDEBUG
	15	#LIB_DB=-lz
14	16
15	17	# this file works for x86 LINUX
16	18

+2

-0

make/Makefile.linux_pgsql less more

10	10	SHELL=/bin/bash
11	11
12	12	CC= gcc -g -O
	13	LIB_DB=
13	14	#CC= gcc -g -DDEBUG
	15	#LIB_DB=-lz
14	16	#CC=/opt/parasoft/bin.linux2/insure -g -DDEBUG
15	17
16	18	# this file works for x86 LINUX

+2

-0

make/Makefile.linux_sql less more

10	10	SHELL=/bin/bash
11	11
12	12	CC= gcc -g -O
	13	LIB_DB=
13	14	#CC= gcc -g -DDEBUG
	15	#LIB_DB=-lz
14	16	#CC=/opt/parasoft/bin.linux2/insure -g -DDEBUG
15	17
16	18	# this file works for x86 LINUX

+0

-1

~~make/Makefile.linux_sse2~~ less more

0

Makefile.linux64_sse2⏎

+67

-0

make/Makefile.linux_sse2 less more

	0	# $ Id: $
	1	#
	2	# makefile for fasta3, fasta3_t Use Makefile.mpi for fasta36_mpi
	3	#
	4	# This file is designed for 64-bit Linux systems using an X86
	5	# architecture with SSE2 extensions. -D_LARGEFILE64_SOURCE and
	6	# -DBIG_LIB64 require a 64-bit linux system.
	7	# SSE2 extensions are used for ssearch35(_t)
	8	#
	9	# Use Makefile.linux32_sse2 for 32-bit linux x86
	10	#
	11
	12	SHELL=/bin/bash
	13
	14	CC = gcc -g -O -msse2
	15	LIB_DB=
	16
	17	#CC= gcc -pg -g -O -msse2 -ffast-math
	18	#CC = gcc -g -DDEBUG -msse2
	19	#CC=gcc -Wall -pedantic -ansi -g -msse2 -DDEBUG
	20
	21	# EBI uses the following with pgcc, -O3 does not work:
	22	# CC= pgcc -O2 -pipe -mcpu=pentiumpro -march=pentiumpro -fomit-frame-pointer
	23
	24	# this file works for x86 LINUX
	25
	26	# standard options
	27
	28	CFLAGS= -DSHOW_HELP -DSHOWSIM -DUNIX -DTIMES -DHZ=100 -DMAX_WORKERS=8 -DTHR_EXIT=pthread_exit -DM10_CONS -D_REENTRANT -DHAS_INTTYPES -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DUSE_FSEEKO -DSAMP_STATS -DPGM_DOC -DUSE_MMAP -D_LARGEFILE64_SOURCE -DBIG_LIB64
	29	# -I/usr/include/mysql -DMYSQL_DB
	30	# -DSUPERFAMNUM -DSFCHAR="'\|'"
	31
	32	#
	33	#(for mySQL databases) (also requires change to Makefile36m.common or use of Makefile36m.common_mysql)
	34	# run 'mysql_config' so find locations of mySQL files
	35
	36	LIB_M = -lm
	37	# for mySQL databases
	38	# LIB_M = -L/usr/lib64/mysql -lmysqlclient -lm
	39
	40	HFLAGS= -o
	41	NFLAGS= -o
	42
	43	# for Linux
	44	THR_SUBS = pthr_subs2
	45	THR_LIBS = -lpthread
	46	THR_CC =
	47
	48	BIN = ../bin
	49	XDIR = /seqprg/bin
	50	#XDIR = ~/bin/LINUX
	51
	52	# set up files for SSE2/Altivec acceleration
	53	#
	54	include ../make/Makefile.sse_alt
	55
	56	# SSE2 acceleration
	57	#
	58	DROPGSW_O = $(DROPGSW_SSE_O)
	59	DROPLAL_O = $(DROPLAL_SSE_O)
	60	DROPGNW_O = $(DROPGNW_SSE_O)
	61	DROPLNW_O = $(DROPLNW_SSE_O)
	62
	63	# renamed (fasta36) programs
	64	include ../make/Makefile36m.common
	65	# conventional (fasta3) names
	66	# include ../make/Makefile.common

+3

-0

make/Makefile.os_x less more

12	12
13	13	# in my hands, gcc-4.0 is about 40% slower than gcc-3.3 on the Altivec code
14	14	CC= gcc -g -O3 -arch ppc -falign-loops=32 -O3 -maltivec -mpim-altivec -force_cpusubtype_ALL
	15	LIB_DB=
	16
15	17	# -pg -finstrument-functions -lSaturn
16	18
17	19	#CC= gcc-3.3 -g -falign-loops=32 -O3 -mcpu=7450 -faltivec
18	20	#CC= gcc-3.3 -g -DDEBUG -mcpu=7450 -faltivec
	21	#LIB_DB=-lz
19	22	#CC= cc -g -Wall -pedantic -faltivec
20	23	#
21	24	# standard line for normal searching

+2

-0

make/Makefile.os_x86 less more

12	12	SHELL=/bin/bash
13	13
14	14	CC= gcc -g -O3 -arch i386 -msse2
	15	LIB_DB=
15	16	#CC= gcc -g -DDEBUG -arch i386 -msse2
	17	#LIB_DB=-lz
16	18
17	19	#CC= cc -g -Wall -pedantic
18	20	#

+3

-0

make/Makefile.os_x86_64 less more

12	12	SHELL=/bin/bash
13	13
14	14	CC= cc -O -g -arch x86_64 -msse2
	15	LIB_DB=
	16
15	17	#CC= cc -g -DDEBUG -fsanitize=address -arch x86_64 -msse2
	18	#LIB_DB=-lz
16	19
17	20	#CC= cc -g -Wall -pedantic
18	21	#

+2

-0

make/Makefile.os_x86_clang less more

12	12	SHELL=/bin/bash
13	13
14	14	CC= clang -g -O -arch x86_64 -msse2
	15	LIB_DB=
15	16	#CC= clang -g -DDEBUG -arch x86_64 -msse2
	17	#LIB_DB=-lz
16	18
17	19	#CC= cc -g -Wall -pedantic
18	20	#

+2

-0

make/Makefile.os_x86_icc less more

12	12	SHELL=/bin/bash
13	13
14	14	CC= icc -g -O -m64 # intel icc compiler
	15	LIB_DB=
15	16	#CC= icc -g -DDEBUG -m64
	17	#LIB_DB=-lz
16	18
17	19	#CC= cc -g -Wall -pedantic
18	20	#

+41

-41

make/Makefile.pcom less more

61	61	pushd $(BIN); cp $(TPROGS) $(XDIR); popd
62	62
63	63	fasta36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fa.o scale_se.o karlin.o $(DROPNFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
64		$(CC) $(HFLAGS) $(BIN)/fasta36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M)
	64	$(CC) $(HFLAGS) $(BIN)/fasta36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
65	65
66	66	fastx36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fx.o scale_se.o karlin.o drop_fx.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
67		$(CC) $(HFLAGS) $(BIN)/fastx36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fx.o drop_fx.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M)
	67	$(CC) $(HFLAGS) $(BIN)/fastx36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fx.o drop_fx.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
68	68
69	69	fasty36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fy.o scale_se.o karlin.o drop_fz.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
70		$(CC) $(HFLAGS) $(BIN)/fasty36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fy.o drop_fz.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M)
	70	$(CC) $(HFLAGS) $(BIN)/fasty36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fy.o drop_fz.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
71	71
72	72	fastf36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_ff.o scaleswts.o last_tat.o tatstats_ff.o karlin.o $(DROPFF_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
73		$(CC) $(HFLAGS) $(BIN)/fastf36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_ff.o $(DROPFF_O) scaleswts.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M)
	73	$(CC) $(HFLAGS) $(BIN)/fastf36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_ff.o $(DROPFF_O) scaleswts.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
74	74
75	75	fasts36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fs.o scaleswts.o last_tat.o tatstats_fs.o karlin.o $(DROPFS_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
76		$(CC) $(HFLAGS) $(BIN)/fasts36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fs.o $(DROPFS_O) scaleswts.o last_tat.o tatstats_fs.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M)
	76	$(CC) $(HFLAGS) $(BIN)/fasts36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fs.o $(DROPFS_O) scaleswts.o last_tat.o tatstats_fs.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
77	77
78	78	fastm36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fm.o scaleswts.o last_tat.o tatstats_fm.o karlin.o $(DROPFM_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
79		$(CC) $(HFLAGS) $(BIN)/fastm36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fm.o $(DROPFM_O) scaleswts.o last_tat.o tatstats_fm.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M)
	79	$(CC) $(HFLAGS) $(BIN)/fastm36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fm.o $(DROPFM_O) scaleswts.o last_tat.o tatstats_fm.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
80	80
81	81	tfastx36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfx.o scale_se.o karlin.o drop_tfx.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
82		$(CC) $(HFLAGS) $(BIN)/tfastx36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfx.o drop_tfx.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M)
	82	$(CC) $(HFLAGS) $(BIN)/tfastx36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfx.o drop_tfx.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
83	83
84	84	tfasty36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfy.o scale_se.o karlin.o drop_tfz.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
85		$(CC) $(HFLAGS) $(BIN)/tfasty36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfy.o drop_tfz.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M)
	85	$(CC) $(HFLAGS) $(BIN)/tfasty36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfy.o drop_tfz.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
86	86
87	87	tfastf36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tf.o scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(DROPTFF_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o
88		$(CC) $(HFLAGS) $(BIN)/tfastf36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tf.o $(DROPTFF_O) scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M)
	88	$(CC) $(HFLAGS) $(BIN)/tfastf36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tf.o $(DROPTFF_O) scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
89	89
90	90	tfastf36s : $(COMP_LIBO) $(COMPACC_SO) showsum.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tf.o scaleswtf.o karlin.o $(DROPTFF_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o
91		$(CC) $(HFLAGS) $(BIN)/tfastf36s $(COMP_LIBO) $(COMPACC_SO) showsum.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tf.o $(DROPTFF_O) scaleswtf.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M)
	91	$(CC) $(HFLAGS) $(BIN)/tfastf36s $(COMP_LIBO) $(COMPACC_SO) showsum.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tf.o $(DROPTFF_O) scaleswtf.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
92	92
93	93	tfasts36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfs.o scaleswts.o tatstats_fs.o last_tat.o karlin.o $(DROPTFS_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o
94		$(CC) $(HFLAGS) $(BIN)/tfasts36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfs.o $(DROPTFS_O) scaleswts.o tatstats_fs.o last_tat.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M)
	94	$(CC) $(HFLAGS) $(BIN)/tfasts36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfs.o $(DROPTFS_O) scaleswts.o tatstats_fs.o last_tat.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
95	95
96	96	tfastm36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfm.o scaleswts.o tatstats_fm.o last_tat.o karlin.o $(DROPTFM_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o
97		$(CC) $(HFLAGS) $(BIN)/tfastm36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfm.o $(DROPTFM_O) scaleswts.o tatstats_fm.o last_tat.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M)
	97	$(CC) $(HFLAGS) $(BIN)/tfastm36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfm.o $(DROPTFM_O) scaleswts.o tatstats_fm.o last_tat.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
98	98
99	99	ssearch36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o scale_se.o karlin.o $(DROPGSW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
100		$(CC) $(HFLAGS) $(BIN)/ssearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPGSW_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M)
	100	$(CC) $(HFLAGS) $(BIN)/ssearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPGSW_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB)
101	101
102	102	# do not use accelerated Smith-Waterman
103	103	ssearch36s : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o scale_se.o karlin.o $(DROPGSW_NA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
104		$(CC) $(HFLAGS) $(BIN)/ssearch36s $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPGSW_NA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M)
	104	$(CC) $(HFLAGS) $(BIN)/ssearch36s $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPGSW_NA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB)
105	105
106	106	lalign36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(LSHOWALIGN).o htime.o apam.o doinit.o init_lal.o scale_se.o karlin.o last_thresh.o $(DROPLAL_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
107		$(CC) $(HFLAGS) $(BIN)/lalign36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(LSHOWALIGN).o htime.o apam.o doinit.o init_lal.o $(DROPLAL_O) scale_se.o karlin.o last_thresh.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M)
	107	$(CC) $(HFLAGS) $(BIN)/lalign36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(LSHOWALIGN).o htime.o apam.o doinit.o init_lal.o $(DROPLAL_O) scale_se.o karlin.o last_thresh.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB)
108	108
109	109	osearch36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_ssw.o scale_se.o karlin.o $(DROPNSW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o
110		$(CC) $(HFLAGS) $(BIN)/osearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_ssw.o $(DROPNSW_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M)
	110	$(CC) $(HFLAGS) $(BIN)/osearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_ssw.o $(DROPNSW_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
111	111
112	112	glsearch36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o scale_sn.o karlin.o $(DROPLNW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
113		$(CC) $(HFLAGS) $(BIN)/glsearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPLNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M)
	113	$(CC) $(HFLAGS) $(BIN)/glsearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPLNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB)
114	114
115	115	ggsearch36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o scale_sn.o karlin.o $(DROPGNW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
116		$(CC) $(HFLAGS) $(BIN)/ggsearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPGNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M)
	116	$(CC) $(HFLAGS) $(BIN)/ggsearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPGNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB)
117	117
118	118	prss36 : ssearch36
119	119	ln -sf ssearch36 prss36
120	120
121	121	ssearch36_t : $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o scale_se.o karlin.o $(DROPGSW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
122		$(CC) $(HFLAGS) $(BIN)/ssearch36_t $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGSW_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(THR_LIBS)
	122	$(CC) $(HFLAGS) $(BIN)/ssearch36_t $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGSW_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
123	123
124	124	ssearch36s_t : $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o scale_se.o karlin.o $(DROPGSW_NA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
125		$(CC) $(HFLAGS) $(BIN)/ssearch36s_t $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGSW_NA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(THR_LIBS)
	125	$(CC) $(HFLAGS) $(BIN)/ssearch36s_t $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGSW_NA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
126	126
127	127	glsearch36_t : $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o scale_sn.o karlin.o $(DROPLNW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
128		$(CC) $(HFLAGS) $(BIN)/glsearch36_t $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPLNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(THR_LIBS)
	128	$(CC) $(HFLAGS) $(BIN)/glsearch36_t $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPLNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
129	129
130	130	glsearch36s_t : $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o scale_sn.o karlin.o $(DROPLNW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
131		$(CC) $(HFLAGS) $(BIN)/glsearch36s_t $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPLNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(THR_LIBS)
	131	$(CC) $(HFLAGS) $(BIN)/glsearch36s_t $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPLNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
132	132
133	133	ggsearch36_t : $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o scale_sn.o karlin.o $(DROPGNW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
134		$(CC) $(HFLAGS) $(BIN)/ggsearch36_t $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(THR_LIBS)
	134	$(CC) $(HFLAGS) $(BIN)/ggsearch36_t $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
135	135
136	136	ggsearch36s_t : $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o scale_sn.o karlin.o $(DROPGNW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
137		$(CC) $(HFLAGS) $(BIN)/ggsearch36s_t $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(THR_LIBS)
	137	$(CC) $(HFLAGS) $(BIN)/ggsearch36s_t $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
138	138
139	139	fasta36_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o scale_se.o karlin.o $(DROPNFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o
140		$(CC) $(HFLAGS) $(BIN)/fasta36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	140	$(CC) $(HFLAGS) $(BIN)/fasta36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
141	141
142	142	fasta36sum_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o scale_se.o karlin.o $(DROPNFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o
143		$(CC) $(HFLAGS) $(BIN)/fasta36sum_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	143	$(CC) $(HFLAGS) $(BIN)/fasta36sum_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
144	144
145	145	fasta36u_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showun.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o scale_se.o karlin.o $(DROPNFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
146		$(CC) $(HFLAGS) $(BIN)/fasta36u_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showun.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	146	$(CC) $(HFLAGS) $(BIN)/fasta36u_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showun.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
147	147
148	148	fasta36r_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showrel.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o scale_se.o karlin.o $(DROPNFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
149		$(CC) $(HFLAGS) $(BIN)/fasta36r_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showrel.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	149	$(CC) $(HFLAGS) $(BIN)/fasta36r_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showrel.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
150	150
151	151	fastf36_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_ff.o scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(DROPFF_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
152		$(CC) $(HFLAGS) $(BIN)/fastf36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_ff.o $(DROPFF_O) scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	152	$(CC) $(HFLAGS) $(BIN)/fastf36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_ff.o $(DROPFF_O) scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
153	153
154	154	fastf36s_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showsum.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_ff.o scaleswtf.o karlin.o $(DROPFF_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
155		$(CC) $(HFLAGS) $(BIN)/fastf36s_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showsum.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_ff.o $(DROPFF_O) scaleswtf.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	155	$(CC) $(HFLAGS) $(BIN)/fastf36s_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showsum.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_ff.o $(DROPFF_O) scaleswtf.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
156	156
157	157	fasts36_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fs.o scaleswts.o last_tat.o tatstats_fs.o karlin.o $(DROPFS_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
158		$(CC) $(HFLAGS) $(BIN)/fasts36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fs.o $(DROPFS_O) scaleswts.o last_tat.o tatstats_fs.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	158	$(CC) $(HFLAGS) $(BIN)/fasts36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fs.o $(DROPFS_O) scaleswts.o last_tat.o tatstats_fs.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
159	159
160	160	fastm36_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fm.o scaleswts.o last_tat.o tatstats_fm.o karlin.o $(DROPFM_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
161		$(CC) $(HFLAGS) $(BIN)/fastm36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fm.o $(DROPFM_O) scaleswts.o last_tat.o tatstats_fm.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	161	$(CC) $(HFLAGS) $(BIN)/fastm36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fm.o $(DROPFM_O) scaleswts.o last_tat.o tatstats_fm.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
162	162
163	163	fastx36_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o c_dispn.o htime.o apam.o doinit.o init_fx.o faatran.o scale_se.o karlin.o drop_fx.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o
164		$(CC) $(HFLAGS) $(BIN)/fastx36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fx.o drop_fx.o faatran.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	164	$(CC) $(HFLAGS) $(BIN)/fastx36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fx.o drop_fx.o faatran.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
165	165
166	166	fasty36_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o c_dispn.o htime.o apam.o doinit.o init_fy.o faatran.o scale_se.o karlin.o drop_fz.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o
167		$(CC) $(HFLAGS) $(BIN)/fasty36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fy.o drop_fz.o faatran.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	167	$(CC) $(HFLAGS) $(BIN)/fasty36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fy.o drop_fz.o faatran.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
168	168
169	169	tfasta36 : $(COMP_LIBO) compacc.o $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfa.o scale_se.o karlin.o $(DROPTFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
170		$(CC) $(HFLAGS) $(BIN)/tfasta36 $(COMP_LIBO) compacc.o $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfa.o $(DROPTFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M)
	170	$(CC) $(HFLAGS) $(BIN)/tfasta36 $(COMP_LIBO) compacc.o $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfa.o $(DROPTFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
171	171
172	172	tfasta36_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o c_dispn.o htime.o apam.o doinit.o init_tfa.o scale_se.o karlin.o $(DROPTFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
173		$(CC) $(HFLAGS) $(BIN)/tfasta36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfa.o $(DROPTFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	173	$(CC) $(HFLAGS) $(BIN)/tfasta36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfa.o $(DROPTFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
174	174
175	175	tfastf36_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o c_dispn.o htime.o apam.o doinit.o init_tf.o scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(DROPTFF_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o
176		$(CC) $(HFLAGS) $(BIN)/tfastf36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tf.o $(DROPTFF_O) scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	176	$(CC) $(HFLAGS) $(BIN)/tfastf36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tf.o $(DROPTFF_O) scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
177	177
178	178	tfasts36_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o c_dispn.o htime.o apam.o doinit.o init_tfs.o scaleswts.o last_tat.o tatstats_fs.o karlin.o $(DROPTFS_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o
179		$(CC) $(HFLAGS) $(BIN)/tfasts36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfs.o $(DROPTFS_O) scaleswts.o last_tat.o tatstats_fs.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	179	$(CC) $(HFLAGS) $(BIN)/tfasts36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfs.o $(DROPTFS_O) scaleswts.o last_tat.o tatstats_fs.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
180	180
181	181	tfastx36_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfx.o scale_se.o karlin.o drop_tfx.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
182		$(CC) $(HFLAGS) $(BIN)/tfastx36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfx.o drop_tfx.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	182	$(CC) $(HFLAGS) $(BIN)/tfastx36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfx.o drop_tfx.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
183	183
184	184	tfasty36_t : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfy.o scale_se.o karlin.o drop_tfz.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
185		$(CC) $(HFLAGS) $(BIN)/tfasty36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfy.o drop_tfz.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	185	$(CC) $(HFLAGS) $(BIN)/tfasty36_t $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfy.o drop_tfz.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
186	186
187	187	comp_mlib5e.o : comp_lib5e.c mw.h structs.h defs.h param.h
188	188	$(CC) $(THR_CC) $(CFLAGS) -DCOMP_MLIB -c comp_lib5e.c -o comp_mlib5e.o

212	212	$(CC) $(THR_CC) $(CFLAGS) -c work_thr2.c
213	213
214	214	print_pssm : print_pssm.c getseq.c karlin.c apam.cn pssm_asn_subs.c
215		$(CC) -o print_pssm $(CFLAGS) print_pssm.c getseq.c karlin.c apam.c pssm_asn_subs.c $(LIB_M)
	215	$(CC) -o print_pssm $(CFLAGS) print_pssm.c getseq.c karlin.c apam.c pssm_asn_subs.c $(LIB_M) $(LIB_DB)
216	216
217	217	map_db : map_db.c uascii.h ncbl2_head.h
218	218	$(CC) $(CFLAGS) -o $(BIN)/map_db map_db.c

+20

-20

make/Makefile.pcom_s less more

57	57	pushd $(BIN); cp $(TPROGS) $(XDIR); popd
58	58
59	59	fasta36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fa.o scale_se.o karlin.o $(DROPNFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
60		$(CC) $(HFLAGS) $(BIN)/fasta36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M)
	60	$(CC) $(HFLAGS) $(BIN)/fasta36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
61	61
62	62	fastx36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fx.o scale_se.o karlin.o drop_fx.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
63		$(CC) $(HFLAGS) $(BIN)/fastx36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fx.o drop_fx.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M)
	63	$(CC) $(HFLAGS) $(BIN)/fastx36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fx.o drop_fx.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
64	64
65	65	fasty36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fy.o scale_se.o karlin.o drop_fz.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
66		$(CC) $(HFLAGS) $(BIN)/fasty36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fy.o drop_fz.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M)
	66	$(CC) $(HFLAGS) $(BIN)/fasty36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fy.o drop_fz.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
67	67
68	68	fastf36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_ff.o scaleswts.o last_tat.o tatstats_ff.o karlin.o $(DROPFF_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
69		$(CC) $(HFLAGS) $(BIN)/fastf36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_ff.o $(DROPFF_O) scaleswts.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M)
	69	$(CC) $(HFLAGS) $(BIN)/fastf36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_ff.o $(DROPFF_O) scaleswts.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
70	70
71	71	fasts36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fs.o scaleswts.o last_tat.o tatstats_fs.o karlin.o $(DROPFS_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
72		$(CC) $(HFLAGS) $(BIN)/fasts36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fs.o $(DROPFS_O) scaleswts.o last_tat.o tatstats_fs.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M)
	72	$(CC) $(HFLAGS) $(BIN)/fasts36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fs.o $(DROPFS_O) scaleswts.o last_tat.o tatstats_fs.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
73	73
74	74	fastm36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fm.o scaleswts.o last_tat.o tatstats_fm.o karlin.o $(DROPFM_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
75		$(CC) $(HFLAGS) $(BIN)/fastm36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fm.o $(DROPFM_O) scaleswts.o last_tat.o tatstats_fm.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M)
	75	$(CC) $(HFLAGS) $(BIN)/fastm36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_fm.o $(DROPFM_O) scaleswts.o last_tat.o tatstats_fm.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
76	76
77	77	tfastx36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfx.o scale_se.o karlin.o drop_tfx.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
78		$(CC) $(HFLAGS) $(BIN)/tfastx36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfx.o drop_tfx.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M)
	78	$(CC) $(HFLAGS) $(BIN)/tfastx36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfx.o drop_tfx.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
79	79
80	80	tfasty36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfy.o scale_se.o karlin.o drop_tfz.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
81		$(CC) $(HFLAGS) $(BIN)/tfasty36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfy.o drop_tfz.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M)
	81	$(CC) $(HFLAGS) $(BIN)/tfasty36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfy.o drop_tfz.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
82	82
83	83	tfasta36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfa.o scale_se.o karlin.o $(DROPTFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
84		$(CC) $(HFLAGS) $(BIN)/tfasta36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfa.o $(DROPTFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M)
	84	$(CC) $(HFLAGS) $(BIN)/tfasta36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfa.o $(DROPTFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
85	85
86	86	tfastf36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tf.o scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(DROPTFF_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o
87		$(CC) $(HFLAGS) $(BIN)/tfastf36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tf.o $(DROPTFF_O) scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M)
	87	$(CC) $(HFLAGS) $(BIN)/tfastf36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tf.o $(DROPTFF_O) scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
88	88
89	89	tfastf36s : $(COMP_LIBO) $(COMPACC_SO) showsum.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tf.o scaleswtf.o karlin.o $(DROPTFF_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o
90		$(CC) $(HFLAGS) $(BIN)/tfastf36s $(COMP_LIBO) $(COMPACC_SO) showsum.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tf.o $(DROPTFF_O) scaleswtf.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M)
	90	$(CC) $(HFLAGS) $(BIN)/tfastf36s $(COMP_LIBO) $(COMPACC_SO) showsum.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tf.o $(DROPTFF_O) scaleswtf.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
91	91
92	92	tfasts36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfs.o scaleswts.o tatstats_fs.o last_tat.o karlin.o $(DROPTFS_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o
93		$(CC) $(HFLAGS) $(BIN)/tfasts36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfs.o $(DROPTFS_O) scaleswts.o tatstats_fs.o last_tat.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M)
	93	$(CC) $(HFLAGS) $(BIN)/tfasts36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfs.o $(DROPTFS_O) scaleswts.o tatstats_fs.o last_tat.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
94	94
95	95	tfastm36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfm.o scaleswts.o tatstats_fm.o last_tat.o karlin.o $(DROPTFM_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o
96		$(CC) $(HFLAGS) $(BIN)/tfastm36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfm.o $(DROPTFM_O) scaleswts.o tatstats_fm.o last_tat.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M)
	96	$(CC) $(HFLAGS) $(BIN)/tfastm36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfm.o $(DROPTFM_O) scaleswts.o tatstats_fm.o last_tat.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB)
97	97
98	98	ssearch36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o scale_se.o karlin.o $(DROPGSW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
99		$(CC) $(HFLAGS) $(BIN)/ssearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPGSW_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M)
	99	$(CC) $(HFLAGS) $(BIN)/ssearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPGSW_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB)
100	100
101	101	# do not use accelerated Smith-Waterman
102	102	ssearch36s : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o scale_se.o karlin.o $(DROPGSW_NA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
103		$(CC) $(HFLAGS) $(BIN)/ssearch36s $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPGSW_NA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M)
	103	$(CC) $(HFLAGS) $(BIN)/ssearch36s $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPGSW_NA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB)
104	104
105	105	lalign36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(LSHOWALIGN).o htime.o apam.o doinit.o init_lal.o scale_se.o karlin.o last_thresh.o $(DROPLAL_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
106		$(CC) $(HFLAGS) $(BIN)/lalign36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(LSHOWALIGN).o htime.o apam.o doinit.o init_lal.o $(DROPLAL_O) scale_se.o karlin.o last_thresh.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M)
	106	$(CC) $(HFLAGS) $(BIN)/lalign36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(LSHOWALIGN).o htime.o apam.o doinit.o init_lal.o $(DROPLAL_O) scale_se.o karlin.o last_thresh.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB)
107	107
108	108	osearch36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_ssw.o scale_se.o karlin.o $(DROPNSW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o
109		$(CC) $(HFLAGS) $(BIN)/osearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_ssw.o $(DROPNSW_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M)
	109	$(CC) $(HFLAGS) $(BIN)/osearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_ssw.o $(DROPNSW_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB)
110	110
111	111	glsearch36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o scale_sn.o karlin.o $(DROPLNW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
112		$(CC) $(HFLAGS) $(BIN)/glsearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPLNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M)
	112	$(CC) $(HFLAGS) $(BIN)/glsearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPLNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB)
113	113
114	114	ggsearch36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o scale_sn.o karlin.o $(DROPGNW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
115		$(CC) $(HFLAGS) $(BIN)/ggsearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPGNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M)
	115	$(CC) $(HFLAGS) $(BIN)/ggsearch36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o $(DROPGNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB)
116	116
117	117	prss36 : ssearch36
118	118	ln -sf ssearch36 prss36

145	145	$(CC) $(THR_CC) $(CFLAGS) -c work_thr2.c
146	146
147	147	print_pssm : print_pssm.c getseq.c karlin.c apam.c pssm_asn_subs.c
148		$(CC) -o print_pssm $(CFLAGS) print_pssm.c getseq.c karlin.c apam.c pssm_asn_subs.c $(LIB_M)
	148	$(CC) -o print_pssm $(CFLAGS) print_pssm.c getseq.c karlin.c apam.c pssm_asn_subs.c $(LIB_M) $(LIB_DB)
149	149
150	150	map_db : map_db.c uascii.h ncbl2_head.h
151	151	$(CC) $(CFLAGS) -o $(BIN)/map_db map_db.c

+24

-24

make/Makefile.pcom_t less more

53	53	pushd $(BIN); cp $(TPROGS) $(XDIR); popd
54	54
55	55	lalign36 : $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(LSHOWALIGN).o htime.o apam.o doinit.o init_lal.o scale_se.o karlin.o last_thresh.o $(DROPLAL_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
56		$(CC) $(HFLAGS) $(BIN)/lalign36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(LSHOWALIGN).o htime.o apam.o doinit.o init_lal.o $(DROPLAL_O) scale_se.o karlin.o last_thresh.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M)
	56	$(CC) $(HFLAGS) $(BIN)/lalign36 $(COMP_LIBO) $(COMPACC_SO) $(SHOWBESTO) re_getlib.o $(LSHOWALIGN).o htime.o apam.o doinit.o init_lal.o $(DROPLAL_O) scale_se.o karlin.o last_thresh.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB)
57	57
58	58	ssearch36 : $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o scale_se.o karlin.o $(DROPGSW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
59		$(CC) $(HFLAGS) $(BIN)/ssearch36 $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGSW_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(THR_LIBS)
	59	$(CC) $(HFLAGS) $(BIN)/ssearch36 $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGSW_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
60	60
61	61	ssearch36s : $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o scale_se.o karlin.o $(DROPGSW_NA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
62		$(CC) $(HFLAGS) $(BIN)/ssearch36s $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGSW_NA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(THR_LIBS)
	62	$(CC) $(HFLAGS) $(BIN)/ssearch36s $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGSW_NA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
63	63
64	64	glsearch36 : $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o scale_sn.o karlin.o $(DROPLNW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
65		$(CC) $(HFLAGS) $(BIN)/glsearch36 $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPLNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(THR_LIBS)
	65	$(CC) $(HFLAGS) $(BIN)/glsearch36 $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPLNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
66	66
67	67	glsearch36s : $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o scale_sn.o karlin.o $(DROPLNW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
68		$(CC) $(HFLAGS) $(BIN)/glsearch36s $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPLNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(THR_LIBS)
	68	$(CC) $(HFLAGS) $(BIN)/glsearch36s $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPLNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
69	69
70	70	ggsearch36 : $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o scale_sn.o karlin.o $(DROPGNW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
71		$(CC) $(HFLAGS) $(BIN)/ggsearch36 $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(THR_LIBS)
	71	$(CC) $(HFLAGS) $(BIN)/ggsearch36 $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
72	72
73	73	ggsearch36s : $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o scale_sn.o karlin.o $(DROPGNW_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o
74		$(CC) $(HFLAGS) $(BIN)/ggsearch36s $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(THR_LIBS)
	74	$(CC) $(HFLAGS) $(BIN)/ggsearch36s $(COMP_THRO) ${WORK_THRO} $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o $(DROPGNW_O) scale_sn.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o pssm_asn_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
75	75
76	76	fasta36 : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o scale_se.o karlin.o $(DROPNFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o
77		$(CC) $(HFLAGS) $(BIN)/fasta36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	77	$(CC) $(HFLAGS) $(BIN)/fasta36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
78	78
79	79	fasta36sum : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o scale_se.o karlin.o $(DROPNFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o
80		$(CC) $(HFLAGS) $(BIN)/fasta36sum $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	80	$(CC) $(HFLAGS) $(BIN)/fasta36sum $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showsum.o re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
81	81
82	82	fasta36u : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showun.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o scale_se.o karlin.o $(DROPNFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
83		$(CC) $(HFLAGS) $(BIN)/fasta36u $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showun.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	83	$(CC) $(HFLAGS) $(BIN)/fasta36u $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showun.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
84	84
85	85	fasta36r : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showrel.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o scale_se.o karlin.o $(DROPNFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
86		$(CC) $(HFLAGS) $(BIN)/fasta36r $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showrel.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	86	$(CC) $(HFLAGS) $(BIN)/fasta36r $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showrel.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fa.o $(DROPNFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
87	87
88	88	fastf36 : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_ff.o scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(DROPFF_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
89		$(CC) $(HFLAGS) $(BIN)/fastf36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_ff.o $(DROPFF_O) scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	89	$(CC) $(HFLAGS) $(BIN)/fastf36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_ff.o $(DROPFF_O) scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
90	90
91	91	fastf36s : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showsum.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_ff.o scaleswtf.o karlin.o $(DROPFF_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
92		$(CC) $(HFLAGS) $(BIN)/fastf36s $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showsum.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_ff.o $(DROPFF_O) scaleswtf.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	92	$(CC) $(HFLAGS) $(BIN)/fastf36s $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) showsum.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_ff.o $(DROPFF_O) scaleswtf.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
93	93
94	94	fasts36 : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fs.o scaleswts.o last_tat.o tatstats_fs.o karlin.o $(DROPFS_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
95		$(CC) $(HFLAGS) $(BIN)/fasts36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fs.o $(DROPFS_O) scaleswts.o last_tat.o tatstats_fs.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	95	$(CC) $(HFLAGS) $(BIN)/fasts36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fs.o $(DROPFS_O) scaleswts.o last_tat.o tatstats_fs.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
96	96
97	97	fastm36 : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fm.o scaleswts.o last_tat.o tatstats_fm.o karlin.o $(DROPFM_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o
98		$(CC) $(HFLAGS) $(BIN)/fastm36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fm.o $(DROPFM_O) scaleswts.o last_tat.o tatstats_fm.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	98	$(CC) $(HFLAGS) $(BIN)/fastm36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fm.o $(DROPFM_O) scaleswts.o last_tat.o tatstats_fm.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
99	99
100	100	fastx36 : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o c_dispn.o htime.o apam.o doinit.o init_fx.o faatran.o scale_se.o karlin.o drop_fx.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o
101		$(CC) $(HFLAGS) $(BIN)/fastx36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fx.o drop_fx.o faatran.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	101	$(CC) $(HFLAGS) $(BIN)/fastx36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fx.o drop_fx.o faatran.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
102	102
103	103	fasty36 : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o c_dispn.o htime.o apam.o doinit.o init_fy.o faatran.o scale_se.o karlin.o drop_fz.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o
104		$(CC) $(HFLAGS) $(BIN)/fasty36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fy.o drop_fz.o faatran.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	104	$(CC) $(HFLAGS) $(BIN)/fasty36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_fy.o drop_fz.o faatran.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
105	105
106	106	tfasta36 : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o c_dispn.o htime.o apam.o doinit.o init_tfa.o scale_se.o karlin.o $(DROPTFA_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
107		$(CC) $(HFLAGS) $(BIN)/tfasta36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfa.o $(DROPTFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	107	$(CC) $(HFLAGS) $(BIN)/tfasta36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfa.o $(DROPTFA_O) scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
108	108
109	109	tfastf36 : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o c_dispn.o htime.o apam.o doinit.o init_tf.o scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(DROPTFF_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o
110		$(CC) $(HFLAGS) $(BIN)/tfastf36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tf.o $(DROPTFF_O) scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	110	$(CC) $(HFLAGS) $(BIN)/tfastf36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tf.o $(DROPTFF_O) scaleswtf.o last_tat.o tatstats_ff.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
111	111
112	112	tfasts36 : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o c_dispn.o htime.o apam.o doinit.o init_tfs.o scaleswts.o last_tat.o tatstats_fs.o karlin.o $(DROPTFS_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o
113		$(CC) $(HFLAGS) $(BIN)/tfasts36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfs.o $(DROPTFS_O) scaleswts.o last_tat.o tatstats_fs.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	113	$(CC) $(HFLAGS) $(BIN)/tfasts36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfs.o $(DROPTFS_O) scaleswts.o last_tat.o tatstats_fs.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
114	114
115	115	tfastm36 : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfm.o scaleswts.o tatstats_fm.o last_tat.o karlin.o $(DROPTFM_O) $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o
116		$(CC) $(HFLAGS) $(BIN)/tfastm36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfm.o $(DROPTFM_O) scaleswts.o tatstats_fm.o last_tat.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(THR_LIBS)
	116	$(CC) $(HFLAGS) $(BIN)/tfastm36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_S).o htime.o apam.o doinit.o init_tfm.o $(DROPTFM_O) scaleswts.o tatstats_fm.o last_tat.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o mrandom.o url_subs.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
117	117
118	118	tfastx36 : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfx.o scale_se.o karlin.o drop_tfx.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
119		$(CC) $(HFLAGS) $(BIN)/tfastx36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfx.o drop_tfx.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	119	$(CC) $(HFLAGS) $(BIN)/tfastx36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfx.o drop_tfx.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
120	120
121	121	tfasty36 : $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfy.o scale_se.o karlin.o drop_tfz.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o
122		$(CC) $(HFLAGS) $(BIN)/tfasty36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfy.o drop_tfz.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(THR_LIBS)
	122	$(CC) $(HFLAGS) $(BIN)/tfasty36 $(COMP_THRO) $(WORK_THRO) $(THR_SUBS).o $(COMPACC_TO) $(SHOWBESTO) re_getlib.o $(SHOWALIGN_T).o htime.o apam.o doinit.o init_tfy.o drop_tfz.o scale_se.o karlin.o $(LGETLIB) c_dispn.o $(NCBL_LIB) lib_sel.o faatran.o url_subs.o mrandom.o $(LIB_M) $(LIB_DB) $(THR_LIBS)
123	123
124	124	comp_mlib4.o : comp_lib4.c mw.h structs.h defs.h param.h
125	125	$(CC) $(THR_CC) $(CFLAGS) -DCOMP_MLIB -c comp_lib4.c -o comp_mlib4.o

167	167	$(CC) $(THR_CC) $(CFLAGS) -c work_thr2.c
168	168
169	169	print_pssm : print_pssm.c getseq.c karlin.c apam.c pssm_asn_subs.c
170		$(CC) -o print_pssm $(CFLAGS) print_pssm.c getseq.c karlin.c apam.c pssm_asn_subs.c $(LIB_M)
	170	$(CC) -o print_pssm $(CFLAGS) print_pssm.c getseq.c karlin.c apam.c pssm_asn_subs.c $(LIB_M) $(LIB_DB)
171	171
172	172	map_db : map_db.c uascii.h ncbl2_head.h
173	173	$(CC) $(CFLAGS) -o $(BIN)/map_db map_db.c

+2

-2

make/Makefile36m.common less more

33	33	# and "-L/usr/lib64/mysql -lmysqlclient -lz" in LIB_M
34	34	# some systems may also require a LD_LIBRARY_PATH change
35	35
36		LIB_M= -lm -lz
37		#LIB_M= -L/usr/lib64/mysql -lmysqlclient -lz -lm
	36	LIB_M= -lm
	37	#LIB_M= -L/usr/lib64/mysql -lmysqlclient -lm # -lz
38	38	NCBL_LIB=ncbl2_mlib.o
39	39	#NCBL_LIB=ncbl2_mlib.o mysql_lib.o
40	40

+101

-0

psisearch2/clustal2fasta.pl less more

	0	#!/usr/bin/env perl
	1
	2	################################################################
	3	# copyright (c) 2014,2015 by William R. Pearson and The Rector &
	4	# Visitors of the University of Virginia */
	5	################################################################
	6	# Licensed under the Apache License, Version 2.0 (the "License");
	7	# you may not use this file except in compliance with the License.
	8	# You may obtain a copy of the License at
	9	#
	10	# http://www.apache.org/licenses/LICENSE-2.0
	11	#
	12	# Unless required by applicable law or agreed to in writing,
	13	# software distributed under this License is distributed on an "AS
	14	# IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either
	15	# express or implied. See the License for the specific language
	16	# governing permissions and limitations under the License.
	17	################################################################
	18
	19	################################################################
	20	# clustal2fasta.pl
	21	################################################################
	22	# clustal2fasta.pl takes a standard clustal format alignment file
	23	# and produces the corresponding FASTA file.
	24	#
	25	################################################################
	26
	27	use warnings;
	28	use strict;
	29	use Pod::Usage;
	30	use Getopt::Long;
	31
	32	my ($shelp, $help, $trim) = (0, 0);
	33
	34	GetOptions(
	35	"h\|?" => \$shelp,
	36	"help" => \$help,
	37	);
	38
	39	pod2usage(1) if $shelp;
	40	pod2usage(exitstatus => 0, verbose => 2) if $help;
	41	unless (-f STDIN \|\| -p STDIN \|\| @ARGV) {
	42	pod2usage(1);
	43	}
	44
	45	my @seq_ids = ();
	46	my %msa = ();
	47
	48	# read the first line, first should not be blank
	49	my $title = <>;
	50
	51	while (my $line = <>) {
	52	chomp $line;
	53	next unless ($line);
	54	next if ($line =~ m/^[\s:\*\+\.]+$/); # skip conservation line
	55
	56	my ($seq_id, $align) = split(/\s+/,$line);
	57
	58	if (defined($msa{$seq_id})) {
	59	$msa{$seq_id} .= $align;
	60	}
	61	else {
	62	$msa{$seq_id} = $align;
	63	push @seq_ids, $seq_id;
	64	}
	65	}
	66
	67	for my $seq_id ( @seq_ids ) {
	68	my $fmt_seq = $msa{$seq_id};
	69	$fmt_seq =~ s/(.{0,60})/$1\n/g;
	70	print ">$seq_id\n$fmt_seq";
	71	}
	72
	73	__END__
	74
	75	=pod
	76
	77	=head1 NAME
	78
	79	clustal2fasta.pl
	80
	81	=head1 SYNOPSIS
	82
	83	clustal2fasta.pl clustal.msa
	84
	85	=head1 OPTIONS
	86
	87	-h short help
	88	--help include description
	89
	90
	91	=head1 DESCRIPTION
	92
	93	C<clustal2fasta.pl> takes a Clustal format interleaved multiple
	94	sequence alignment and produces the corresponding fasta format library.
	95
	96	=head1 AUTHOR
	97
	98	William R. Pearson, wrp@virginia.edu
	99
	100	=cut

+71

-0

psisearch2/clustal2fasta.py less more

	0	#!/usr/bin/env python
	1
	2	################################################################
	3	# copyright (c) 2014,2015 by William R. Pearson and The Rector &
	4	# Visitors of the University of Virginia */
	5	################################################################
	6	# Licensed under the Apache License, Version 2.0 (the "License");
	7	# you may not use this file except in compliance with the License.
	8	# You may obtain a copy of the License at
	9	#
	10	# http://www.apache.org/licenses/LICENSE-2.0
	11	#
	12	# Unless required by applicable law or agreed to in writing,
	13	# software distributed under this License is distributed on an "AS
	14	# IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either
	15	# express or implied. See the License for the specific language
	16	# governing permissions and limitations under the License.
	17	################################################################
	18
	19	################################################################
	20	# clustal2fasta.pl
	21	################################################################
	22	# clustal2fasta.pl takes a standard clustal format alignment file
	23	# and produces the corresponding FASTA file.
	24	#
	25	# if --end_mask or --int_mask are set, then end or internal '-'s are converted to the query (first) sequence
	26	# if --trim is set, then alignments beyond the beginning/end of the query sequence are trimmed
	27	#
	28	################################################################
	29
	30	import argparse
	31	import fileinput
	32	import re
	33
	34	################
	35	#
	36	# python re-write of clustal2fasta.pl
	37	#
	38	# in the future, modify for various query seeding strategies
	39	################
	40
	41	arg_parse = argparse.ArgumentParser(description='Convert clustal MSA to FASTA library')
	42	arg_parse.add_argument('--query\|--query_file', dest='query_file', action='store',help='query sequence file')
	43	arg_parse.add_argument('files', metavar='FILE', nargs='*', help='files to read, if empty, stdin is used')
	44	args=arg_parse.parse_args()
	45
	46	msa = {}
	47	seq_ids = []
	48
	49	is_line1 = True
	50	for line in fileinput.input(args.files):
	51	if is_line1:
	52	is_line1 = False
	53	continue
	54	line = line.strip()
	55	if not line:
	56	continue
	57	if re.search(r'^[\s:\*\+\.]+$',line):
	58	continue
	59
	60	(seq_id, align) = re.split(r'\s+',line)
	61
	62	if seq_id in msa:
	63	msa[seq_id] += align
	64	else:
	65	msa[seq_id] = align
	66	seq_ids.append(seq_id)
	67
	68	for seq_id in seq_ids:
	69	fmt_seq = re.sub(r'(.{0,60})',r'\1\n',msa[seq_id])
	70	print ">%s\n%s" % (seq_id, fmt_seq)

+68

-26

psisearch2/m89_btop_msa2.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014,2015 by William R. Pearson and The Rector &

37	37	#
38	38	################################################################
39	39
	40	use warnings;
40	41	use strict;
41	42	use Pod::Usage;
42	43	use Getopt::Long;

50	51
51	52	my ($shelp, $help, $m_format, $evalue, $qvalue, $domain_bound) = (0, 0, "m8CB", 0.001, 30.0,0);
52	53	my ($query_file, $sel_file, $bound_file_in, $bound_file_only, $bound_file_out, $masked_lib_out,$mask_type_end, $mask_type_int) = ("","","","","","","","");
	54	my ($clustal_id,$trunc_acc,$min_align) = (0,0,0.0);
53	55	my $query_lib_r = 0;
54	56	my ($eval2_fmt, $eval2) = (0,"");
55	57

57	59	"query=s" => \$query_file,
58	60	"query_file=s" => \$query_file,
59	61	"eval2=s" => \$eval2, # change the evalue used for inclusion
60		"evalue=f" => \$evalue,
61		"expect=f" => \$evalue,
	62	"evalue\|expect=f" => \$evalue,
62	63	"qvalue=f" => \$qvalue,
63	64	"format=s" => \$m_format,
64		"selected_file_in=s" => \$sel_file,
65		"sel_file_in=s" => \$sel_file,
66		"sel_file=s" => \$sel_file,
67		"m_format=s" => \$m_format,
68		"mformat=s" => \$m_format,
	65	"clustal!" => \$clustal_id,
	66	"trunc_acc!" => \$trunc_acc,
	67	"selected_file_in\|sel_file_in\|sel_accs=s" => \$sel_file,
	68	"m_format\|mformat=s" => \$m_format,
69	69	"bound_file_in=s" => \$bound_file_in,
70	70	"bound_file_only=s" => \$bound_file_only,
71	71	"bound_file_out=s" => \$bound_file_out,

82	82	"domain" => \$domain_bound,
83	83	"int_mask_type=s" => \$mask_type_int,
84	84	"int_mask=s" => \$mask_type_int,
	85	"min_align=f" => \$min_align,
85	86	"h\|?" => \$shelp,
86	87	"help" => \$help,
87	88	);

214	215	$q_acc = $query_descr;
215	216	}
216	217
217		$acc_names{$q_acc} = 1; # this is necessary for the new acc-only NCBI SwissProt libraries
	218	$acc_names{$q_acc} = $q_acc; # this is necessary for the new acc-only NCBI SwissProt libraries
218	219
219	220	$q_acc =~ s/\.\d+$//;
220	221

227	228	my $annot_f='NULL';
228	229
229	230	if ($m_format =~ m/^m9/i) {
230		last if $line =~ m/>>>/;
	231	last if $line =~ m/>>>/ \|\| $line =~ m/^<\/pre>/;
231	232	next if $line =~ m/^\+\-/; # skip over HSPs
232	233	my ($left, $right, $align_f) = ("","",'NULL');
233	234	($left, $right, $align_f, $annot_f) = split(/\t/,$line);

235	236	$align_f= 'NULL' unless $align_f;
236	237	$annot_f= 'NULL' unless $annot_f;
237	238
	239	if ($left =~ m/<font/) {
	240	$left =~ s/<font color="darkred">//;
	241	$left =~ s/<\/font>//;
	242	}
	243
238	244	my @fields = split(/\s+/,$left);
239		my ($ldb, $l_id, $l_acc) = ("","","");
240		if ($fields[0] =~ m/:/) {
241		($ldb, $l_id) = split(/:/,$fields[0]);
242		($l_acc) = $fields[1];
243		} else {
244		($ldb, $l_acc,$l_id) = split(/\\|/,$fields[0]);
245		}
	245	$subj_acc = $s_seqid = $fields[0];
	246
	247	# my ($ldb, $l_id, $l_acc) = ("","","");
	248	# if ($fields[0] =~ m/:/) {
	249	# ($ldb, $l_id) = split(/:/,$fields[0]);
	250	# ($l_acc) = $fields[1];
	251	# } else {
	252	# ($ldb, $l_acc,$l_id) = split(/\\|/,$fields[0]);
	253	# }
246	254
247	255	@hit_data{@m9_field_names} = split(/\s+/,$right);
	256
248	257	if ($eval2_fmt) {
249	258	@hit_data{qw(bits evalue eval2)} = @fields[-3, -2,-1];
250	259	}

255	264	#
256	265	# currently preselbdr files have $ldb\|$l_acc, not full s_seqid, so construct it
257	266	#
258		($s_seqid, $subj_acc) = (join('\|',($ldb, $l_acc, $l_id)), "$ldb\|$l_acc");
	267	# ($s_seqid, $subj_acc) = (join('\|',($ldb, $l_acc, $l_id)), "$ldb\|$l_acc");
259	268	@hit_data{qw(s_seqid subj_acc)} = ($s_seqid, $subj_acc);
260	269	@hit_data{qw(query_id query_acc)} = ($query_descr, $q_acc);
261	270	$hit_data{BTOP} = $align_f;

265	274	last if $line =~ m/^#/;
266	275	@hit_data{@m8_field_names} = split(/\t/,$line);
267	276	$subj_acc = $hit_data{'s_seqid'};
268		$subj_acc =~ s/^gi\\|\d+\\|(\w+\\|\w+)\\|?\w+/$1/;
	277	# remove gi number
	278	if ($subj_acc =~ m/^gi\|\d+\\|/) {
	279	$subj_acc =~ s/^gi\\|\d+\\|//;
	280	}
269	281	}
270	282
271	283	if ($have_sel_accs) {

284	296	# $s_seqid_u .= "_". $acc_names{$subj_acc};
285	297	}
286	298	else {
	299	my $tr_acc = $hit_data{'s_seqid'};
287	300	$acc_names{$hit_data{'s_seqid'}} = 1;
288	301	}
289	302
290	303	# must be after duplicate seqid check because blast HSP's have bad E-values after good.
291	304	next if ($eval_fptr->(\%hit_data) > $evalue);
292	305
	306	next if (($hit_data{q_end}-$hit_data{q_start}+1)/$query_len < $min_align);
	307
293	308	$hit_data{s_seqid_u} = $s_seqid_u;
294
295		if (length($s_seqid_u) > $max_sseqid_len) {
296		$max_sseqid_len = length($s_seqid_u);
297		}
298	309
299	310	my $have_dom = 0;
300	311	if ($domain_bound && $hit_data{annot}) {

369	380	}
370	381	}
371	382
	383	$max_sseqid_len = 10;
	384	for my $acc ( @multi_names) {
	385	my $this_len = length($acc);
	386	if ($trunc_acc && ($acc=~m/\\|\w+\\|(\w+)$/)) {
	387	$this_len = length($1);
	388	}
	389	if ($this_len > $max_sseqid_len) {
	390	$max_sseqid_len = $this_len;
	391	}
	392	}
	393
372	394	# final MSA output
373	395	$max_sseqid_len += 2;
374	396
375		printf "BTOP%s multiple sequence alignment\n\n\n",$m_format;
	397	if (! $clustal_id) {
	398	printf "BTOP%s multiple sequence alignment\n\n\n",$m_format;
	399	}
	400	else {
	401	print "CLUSTALW (1.8) multiple sequence alignment\n\n\n";
	402	}
376	403
377	404	my $i_pos = 0;
378	405	for (my $j = 0; $j < $query_len/60; $j++) {

380	407	if ($i_end >= $query_len) {$i_end = $query_len-1;}
381	408	for my $acc (@multi_names) {
382	409	next unless $acc;
383		printf("%-".$max_sseqid_len."s %s\n",$acc,join("",@{$multi_align{$acc}}[$i_pos .. $i_end]));
	410
	411	my $this_acc = $acc;
	412	if ($trunc_acc && ($acc=~m/\\|\w+\\|(\w+)$/)) {
	413	$this_acc = $1;
	414	}
	415	printf("%-".$max_sseqid_len."s %s\n",$this_acc,join("",@{$multi_align{$acc}}[$i_pos .. $i_end]));
384	416	}
385	417	$i_pos += 60;
386	418	print "\n\n";

752	784	my ($q_num, $query_desc, $q_start, $q_stop, $q_len, $l_num, $l_len, $best_yes);
753	785
754	786	while (my $line = <>) {
755		if ($line =~ m/^\s*(\d+)>>>(\S+)\s.+ \- (\d+) aa$/) {
	787	if ($line =~ m/^\s(\d+)>>>(\S+)\s.\- (\d+) aa$/) {
756	788	($q_num,$query_desc, $q_len) = ($1,$2,$3);
757	789	# ($q_len) = ($line =~ m/(\d+) aa$/);
758	790	$line = <>; # skip Library:

890	922	--query -- same as --query_file
891	923	(only one sequence per file)
892	924
	925	--expect\|evalue: 0.001 -- maximum e-value to be include in output
	926
893	927	--eval2 : "": use E()-value, "eval2": use E2()/eval2, "ave": use geom. mean
	928
	929	--qvalue: 30.0 -- minimum qvalue for domain to be considered
894	930
895	931	--bound_file_in -- tab delimited accession<tab>start<tab>end that
896	932	specifies MSA boundaries WITHIN alignment.

903	939
904	940	--bound_file_out -- "--bound_file" for next iteration of psisearch2
905	941
	942	--clustal -- use "CLUSTALW (1.8)" multiple alignment string
	943
	944	--trunc_acc -- remove db, acc from db\|acc\|ident, e.g. sp\|P0948\|GSTM1_HUMAN becomes GSTM1_HUMAN
	945
906	946	--domain_bound parse domain annotations (-V) from m9B file
907	947	--domain
908	948
909	949	--masked_lib_out -- FASTA format library of MSA sequences
	950
	951	--min_align:0.0 -- minimum fractional alignment (q_end-q_start+1)/q_len
910	952
911	953	--int_mask_type = "query", "rand", "X", "none"
912	954	--end_mask_type = "query", "rand", "X", "none"

+80

-10

psisearch2/psisearch2_msa.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2016 by William R. Pearson and The Rector &

16	16	# governing permissions and limitations under the License.
17	17	################################################################
18	18
	19	use warnings;
19	20	use strict;
20	21	use Getopt::Long;
21	22	use Pod::Usage;

32	33	################
33	34	#
34	35	# command:
35		# psisearch2_msa.pl --query query.file --db database.file --num_iter N --pssm_evalue 0.002 --int_mask none/query/random --end_mask none/query/random --tmp_dir results/ --domain --align --out_suffix none --pgm ssearch/psiblast --prev_m89res prev_results.itx.m8CB.file --sel_res selected_accs.file --prev_bounds boundary.file
	36	# psisearch2_msa.pl --query query.file --in_msa msa.file --db database.file --num_iter N --pssm_evalue 0.002 --int_mask none/query/random --end_mask none/query/random --tmp_dir results/ --domain --align --out_suffix none --pgm ssearch/psiblast --prev_m89res prev_results.itx.m8CB.file --sel_res selected_accs.file --prev_bounds boundary.file
36	37	#
37	38	################
38	39

53	54	my $makeblastdb_bin = "$pgm_bin/makeblastdb";
54	55	my $datatool_bin = "$pgm_bin/datatool -m $pgm_data/NCBI_all.asn";
55	56	my $align2msa_lib = "$pgm_bin/m89_btop_msa2.pl";
	57	my $clustal2fasta = "$pgm_bin/clustal2fasta.pl";
56	58
57	59	my %srch_subs = ('ssearch' => \&get_ssearch_cmd,
58	60	'psiblast' => \&get_psiblast_cmd,

60	62
61	63	my %annot_cmds = ('rpd3' => qq("\!ann_pfam28.pl --pfacc --db RPD3 --vdoms --split_over"),
62	64	'rpd3nv' => qq("\!ann_pfam28.pl --pfacc --db RPD3 --split_over"),
63		'rpd3nvn' => qq("\!ann_pfam28.pl --pfacc --db RPD3 --split_over --neg"),
64		'pfam' => qq("\!ann_pfam30.pl --vdoms --split_over --neg")
	65	'rpd3nvn' => qq("\!./annot/ann_pfam28.pl --pfacc --db RPD3 --split_over --neg"),
	66	'pfam' => qq("\!./annot/ann_pfam30.pl --db pfam31_qfo --vdoms --split_over --neg")
65	67	);
66	68
67	69	($num_iter, $pssm_evalue, $srch_evalue, $dom_flag, $align_flag, $int_mask, $end_mask, $query_mask, $srch_pgm, $tmp_dir, $error_log, $annot_type, $quiet) =
68	70	( 5, 0.002, 5.0, 0, 0, 'none', 'none', 0, 'ssearch','',0, 0, "", 0);
69	71	($save_all, $tmp_file_list, $delete_bnd, $delete_tmp) = (0, "", 0, 0);
70		($prev_m89res, $m_format, $prev_sel_res, $prev_bound, $this_iter, $use_stdout) = ("","", "","", 1, 0);
	72	($prev_m89res, $m_format, $prev_sel_res, $prev_bound, $this_iter, $use_stdout) = ("","m8CB", "","", 1, 0);
71	73
72	74	my $pgm_command = "# ".join(" ",($0,@ARGV));
73	75	print STDERR "# ",join(" ",($0,@ARGV)),"\n" if ($error_log);

89	91	'sel_accs=s' => \$prev_sel_res,
90	92	'sel_file=s' => \$prev_sel_res,
91	93	'sel_file_in=s' => \$prev_sel_res,
92		# 'in_msa=s' => \$prev_msa,
	94	'in_msa=s' => \$prev_msa,
93	95	# 'out_msa=s' => \$next_msa,
94	96	# 'in_hitdb=s' => \$prev_hitdb,
95	97	# 'out_hitdb=s' => \$next_hitdb,

183	185
184	186	my @del_err_files = ();
185	187
186		unless ($prev_m89res) {
	188	unless ($prev_m89res \|\| $prev_msa) {
187	189	$search = $srch_subs{$srch_pgm}($query_file, $db_file, $prev_pssm);
188	190	unless ($use_stdout) {
189	191	log_system("$search > $this_file_out 2> $this_file_out.err");

194	196	push @del_err_files, "$this_file_out.err";
195	197	$first_iter++;
196	198	}
197		else {
	199	elsif ($prev_m89res) {
198	200	$this_file_out = $prev_m89res;
	201	}
	202	elsif ($prev_msa) {
	203	# build a PSSM, do a search, up the iteration count
	204	$prev_pssm = pssm_from_msa($query_file, $prev_msa);
	205	$search = $srch_subs{$srch_pgm}($query_file, $db_file, $prev_pssm);
	206	unless ($use_stdout) {
	207	log_system("$search > $this_file_out 2> $this_file_out.err");
	208	}
	209	else {
	210	log_system("$search 2> $this_file_out.err");
	211	}
	212	push @del_err_files, "$this_file_out.err";
	213	$first_iter++;
199	214	}
200	215
201	216	my ($this_pssm, $this_bound_out) = ("","");

264	279
265	280	my ($cmd) = @_;
266	281
267		print STDERR "$cmd\n" if $error_log;
	282	print STDERR "# $cmd\n" if $error_log;
268	283	system($cmd);
269	284	}
270	285

275	290	sub get_ssearch_cmd {
276	291	my ($query_file, $db_file, $pssm_file) = @_;
277	292
278		my $search_cmd = qq($ssearch_bin -S -m 6 -m 9B -E "$srch_evalue 0" -s BP62);
	293	my $mf_arg = $m_format;
	294	$mf_arg =~ s/^m//;
	295	$mf_arg =~ s/\+/ /;
	296
	297	my $search_cmd = qq($ssearch_bin -S -E "$srch_evalue 0" -s BP62 -m $mf_arg);
	298
279	299	if ($annot_type) {
280	300	$search_cmd .= qq( -V $annot_cmds{$annot_type});
281	301	}

383	403	}
384	404	else {
385	405	return ($this_pssm_asntxt, $this_bound_out);
	406	}
	407	}
	408
	409	################
	410	# pssm_from_msa()
	411	#
	412	# given query, --in_msa Clustal MSA
	413	# use psiblast to generate PSSM in .asntxt or .asnbin format
	414	# (later - optionally deletes intermediate files)
	415	#
	416	# always produce a $bound_file_out file to test for convergence
	417	#
	418	sub pssm_from_msa {
	419	my ($query_file, $msa_file) = @_;
	420
	421	my $this_file_out = $query_file;
	422
	423	my ($this_hit_db, $this_pssm_asntxt, $this_pssm_asnbin, $this_psibl_out, $this_bound_out) =
	424	("$this_file_out.hit_db",
	425	"$this_file_out.asntxt",
	426	"$this_file_out.asnbin",
	427	"$this_file_out.psibl_out",
	428	"$this_file_out.bnd_out",
	429	);
	430
	431	my $blastdb_err = "$this_file_out.mkbldb_err";
	432	## should not need this, but may need to convert in_msa file to fasta file for equivalence to build_msa_pssm()
	433	my $clus2fa_cmd = qq($clustal2fasta $msa_file > $this_hit_db);
	434
	435	log_system($clus2fa_cmd);
	436
	437	my $makeblastdb_cmd = "$makeblastdb_bin -in $this_hit_db -dbtype prot -parse_seqids > $blastdb_err";
	438	log_system($makeblastdb_cmd);
	439
	440	my $buildpssm_cmd = "$psiblast_bin -max_target_seqs 5000 -outfmt 7 -inclusion_ethresh 100.0 -in_msa $msa_file -db $this_hit_db -out_pssm $this_pssm_asntxt -num_iterations 1 -save_pssm_after_last_round";
	441
	442	log_system("$buildpssm_cmd > $this_psibl_out 2> $this_psibl_out.err");
	443
	444	log_system("rm $this_hit_db.p* $blastdb_err");
	445
	446	# remove uninformative error logs
	447	log_system("rm $this_psibl_out.err") unless $error_log;
	448
	449	unless ($srch_pgm eq 'psiblast') {
	450	my $asn2asn_cmd = "$datatool_bin -v $this_pssm_asntxt -e $this_pssm_asnbin";
	451	log_system($asn2asn_cmd);
	452	return ($this_pssm_asnbin);
	453	}
	454	else {
	455	return ($this_pssm_asntxt);
386	456	}
387	457	}
388	458

+96

-35

psisearch2/psisearch2_msa.py less more

0		#!/usr/bin/python
	0	#!/usr/bin/env python
1	1
2	2	################################################################
3	3	# copyright (c) 2016 by William R. Pearson and The Rector &

33	33	################
34	34	#
35	35	# command:
36		# psisearch2_msa.py --query query_file --db database --num_iter N --evalue 0.002 --no_msa --int_mask none/query/random --end_mask none/query/random --tmp_dir results/ --domain --align --suffix M8CB --pgm ssearch/psiblast --prev_m89res pre_iter.out --this_iter # --num_iter #
	36	# psisearch2_msa.py --query query_file --db database --num_iter N --pssm_evalue 0.002 --no_msa --int_mask none/query/random --end_mask none/query/random --tmp_dir results/ --domain --align --suffix M8CB --pgm ssearch/psiblast --prev_m89res pre_iter.out --this_iter # --num_iter #
37	37	#
38	38	################
39	39

51	51	makeblastdb_bin = pgm_bin+"/makeblastdb"
52	52	datatool_bin = "%s/datatool -m %s/NCBI_all.asn" % (pgm_bin,pgm_data)
53	53	align2msa_lib = "m89_btop_msa2.pl"
	54	clustal2fasta = "clustal2fasta.py"
54	55
55	56	annot_cmds = {'rpd3': '"!../scripts/ann_pfam28.pl --pfacc --db RPD3 --vdoms --split_over"',
56	57	'rpd3nv':'"!../scripts/ann_pfam28.pl --pfacc --db RPD3 --split_over"',
57	58	'pfam':'"!../scripts/ann_pfam30.pl --pfacc --vdoms --split_over"'}
58	59
59	60	num_iter = 5
60		evalue = 0.002
61	61	srch_pgm = 'ssearch'
62		error_log = 0
63	62	rm_flag = 0
64	63	quiet = 0
65	64
66	65	################
67	66	# log_system()
68		# run system on string, logging first if error_log
	67	# run system on string, logging first if args.error_log
69	68	#
70	69	def log_system (cmd, error_log):
71	70

79	78	# sub get_ssearch_cmd()
80	79	# builds an ssearch command line with query, db, and pssm
81	80	#
82		def get_ssearch_cmd(query_file, db_file, pssm_file) :
83
84		search_cmd = '%s -S -m 8CB -d 0 -E "1.0 0" -s BP62' % (ssearch_bin)
	81	def get_ssearch_cmd(query_file, db_file, pssm_file, args) :
	82
	83	search_cmd = '%s -S -m 8CB -d 0 -E "%f 0" -s BP62' % (ssearch_bin, args.srch_evalue)
85	84
86	85	if (args.annot_type) :
87	86	search_cmd += " -V %s" % (annot_cmds[args.annot_type])

98	97	# sub get_psiblast_cmd()
99	98	# builds an ssearch command line with query, db, and pssm
100	99	#
101		def get_psiblast_cmd(query_file, db_file, pssm_file) :
102
103		search_cmd = "%s -num_threads 4 -max_target_seqs 5000 -outfmt '7 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore score btop' -inclusion_ethresh %f -num_iterations 1 -db %s" % (psiblast_bin, args.evalue, db_file)
	100	def get_psiblast_cmd(query_file, db_file, pssm_file, args) :
	101
	102	search_cmd = "%s -num_threads 4 -max_target_seqs 5000 -outfmt '7 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore score btop' -inclusion_ethresh %f -evalue %f -num_iterations 1 -db %s" % (psiblast_bin, args.pssm_evalue, args.srch_evalue, db_file)
104	103
105	104	if (pssm_file) :
106	105	search_cmd += " -in_pssm %s" % (pssm_file)

119	118	#
120	119	# always produce a bound_file_out file to test for convergence
121	120	#
122		def build_msa_pssm(query_file, this_file_out,prev_bound_in, prev_sel_res, args, error_log) :
	121	def build_msa_pssm(query_file, this_file_out,prev_bound_in, prev_sel_res, error_log) :
123	122
124	123	(this_msa, this_hit_db, this_pssm_asntxt, this_pssm_asnbin, this_psibl_out, this_bound_out) = (this_file_out+".msa",this_file_out+".hit_db",this_file_out+".asntxt",this_file_out+".asnbin",this_file_out+".psibl_out",this_file_out+".bnd_out")
125	124

129	128	if (prev_sel_res) :
130	129	aln2msa_cmd += " --sel_res %s" % (prev_sel_res)
131	130	else:
132		aln2msa_cmd += " --evalue %f" % (args.evalue)
	131	aln2msa_cmd += " --evalue %f" % (args.pssm_evalue)
133	132
134	133	if (args.int_mask) :
135	134	aln2msa_cmd += " --int_mask_type %s" % (args.int_mask)

141	140	aln2msa_cmd += " --domain"
142	141
143	142	if (args.align_flag and args.prev_bound_in) :
144		aln2msa_cmd += " --bound_file_in %s" %(args.prev_bound_in)
	143	aln2msa_cmd += " --bound_file_in %s" %(args.prev_bound_in)
	144
	145	if (args.m_format):
	146	aln2msa_cmd += " --m_format %s" % (args.m_format)
145	147
146	148	# always produce this file to check for convergence
147	149	aln2msa_cmd += " --bound_file_out %s" % (this_bound_out)

170	172	return (this_pssm_asntxt, this_bound_out)
171	173
172	174	################
	175	# sub pssm_from_msa
	176	# read multiple sequence alignment, produce pssm file
	177	#
	178	def pssm_from_msa(query_file, msa_file, error_log):
	179
	180	this_file_out = query_file
	181
	182	this_hit_db = this_file_out+".hit_db"
	183	this_pssm_asntxt = this_file_out+".asntxt"
	184	this_pssm_asnbin = this_file_out+".asnbin"
	185	this_psibl_out = this_file_out+".psibl_out"
	186	this_bound_out = this_file_out+".bnd_out"
	187
	188	blastdb_err = this_file_out + ".mkbldb_err"
	189
	190	clus2fa_cmd = "%s %s > %s" % (clustal2fasta, msa_file, this_hit_db)
	191
	192	log_system(clus2fa_cmd, error_log);
	193
	194	makeblastdb_cmd = "%s -in %s -dbtype prot -parse_seqids > %s" % (makeblastdb_bin, this_hit_db, blastdb_err);
	195	log_system(makeblastdb_cmd, error_log);
	196
	197	built_pssm_cmd = "%s -max_target_seqs 5000 -outfmt 7 -inclusion_ethresh 100.0 -in_msa %s -db %s -out_pssm %s -num_iterations 1 -save_pssm_after_last_round" % (psiblast_bin, msa_file, this_hit_db, this_pssm_asntxt)
	198
	199	log_system("%s > %s 2> %s.err" % (buildpssm_cmd, this_psibl_out, this_psibl_out), error_log)
	200
	201	log_system("rm %s.p* %s" % (this_hit_db,blastdb_err), error_log)
	202
	203	# remove uninformative error logs
	204	if (not error_log):
	205	log_system("rm %s.err" % (this_psibl_out), error_log)
	206
	207	if (srch_pgm != 'psiblast'):
	208	asn2asn_cmd = "%s -v %s -e %s" % (datatool_bin, this_pssm_asntxt, this_pssm_asnbin)
	209	log_system(asn2asn_cmd, error_log);
	210	return this_pssm_asnbin
	211	else:
	212	return this_pssm_asntxt
	213
	214	################
173	215	# sub has_converged()
174	216	# reads two boundary files and compares accessions
175	217	#

210	252
211	253	srch_subs = {'ssearch' : get_ssearch_cmd,
212	254	'psiblast': get_psiblast_cmd}
213
214		pgm_command = "# "+" ".join(sys.argv);
215		if (error_log) :
216		sys.stderr.write('pgm_command\n')
217	255
218	256	arg_parse = argparse.ArgumentParser(description='Iterative search with SSEARCH/PSIBLAST')
219	257	arg_parse.add_argument('--query', dest='query_file', action='store',help='query sequence file')

221	259	arg_parse.add_argument('--db', dest='db_file', action='store',help='sequence database name')
222	260	arg_parse.add_argument('--database', dest='db_file', action='store',help='sequence database name')
223	261	arg_parse.add_argument('--dir', dest='tmp_dir', action='store',help='directory for result and tmp_file output')
224		arg_parse.add_argument('--evalue', dest='evalue', default=0.002, type=float, action='store',help='E()-value threshold for inclusion in PSSM')
	262	arg_parse.add_argument('--pssm_evalue', dest='pssm_evalue', default=0.002, type=float, action='store',help='E()-value threshold for inclusion in PSSM')
	263	arg_parse.add_argument('--search_evalue', dest='srch_evalue', default=5.0, type=float, action='store',help='E()-value threshold for search display')
	264	arg_parse.add_argument('--m_format', dest='m_format', action='store',help='input result format m8 [def] or m9')
225	265	arg_parse.add_argument('--annot_db', dest='annot_type', action='store',help='source of domain annotations')
226	266	arg_parse.add_argument('--suffix', dest='suffix', action='store',help='suffix for result output')
227	267	arg_parse.add_argument('--out_name', dest='file_out', action='store',help='result file name')

233	273	arg_parse.add_argument('--pgm', dest='srch_pgm', action='store',default='ssearch',help='search program: ssearch/psiblast')
234	274	arg_parse.add_argument('--query_seed', dest='query_mask', action='store_true',help='use query seeding')
235	275	arg_parse.add_argument('--prev_m89res', dest='prev_m89res', action='store', help='prevous iteration result')
	276	arg_parse.add_argument('--prev_msa', dest='prev_msa', action='store', help='prevous MSA')
236	277	arg_parse.add_argument('--sel_res', dest='prev_sel_res', action='store', help='selected accession file')
237	278	arg_parse.add_argument('--this_iter', dest='this_iter', help='this iteration number',type=int)
238	279	arg_parse.add_argument('--int_seed', dest='int_mask', action='store',default='none',help='sequence masking: none/query/random')

243	284	arg_parse.add_argument('--save_all', dest='save_all', action='store_true',help='save all temporary files')
244	285	arg_parse.add_argument('--delete_all', dest='delete_tmp', action='store_true',help='delete all temporary files')
245	286	arg_parse.add_argument('--delete_bnd', dest='delete_bnd', action='store_true',help='delete boundary temporary file')
	287	arg_parse.add_argument('--use_stdout', dest='use_stdout', action='store_true',help='send results to stdout',default=False)
	288	arg_parse.add_argument('--errors', dest='error_log', action='store_true', help='log errors', default=False)
246	289	arg_parse.add_argument('--quiet', dest='quiet', action='store_true',help='fewer messages')
247	290	arg_parse.add_argument('-Q', dest='quiet', action='store_true',help='fewer messages')
248	291
249	292	args = arg_parse.parse_args()
	293
	294	pgm_command = "# "+" ".join(sys.argv);
	295	if (args.error_log) :
	296	sys.stderr.write('pgm_command\n')
	297
250	298	if (args.quiet) :
251	299	quiet = args.quiet
252	300

317	365	del_err_files = []
318	366
319	367	# do the first search
320		if (not args.prev_m89res):
321		search_str = srch_subs[srch_pgm](args.query_file, args.db_file, args.prev_pssm)
322		log_system(search_str+" > "+this_file_out+" 2> "+this_file_out+".err", error_log)
	368	if (not (args.prev_m89res or args.prev_msa)):
	369	search_str = srch_subs[srch_pgm](args.query_file, args.db_file, args.prev_pssm, args)
	370	if (not args.use_stdout):
	371	log_system(search_str+" > "+this_file_out+" 2> "+this_file_out+".err", args.error_log)
	372	else:
	373	log_system(search_str + " 2> "+this_file_out+".err", args.error_log)
323	374	del_err_files.append(this_file_out+".err")
324	375	first_iter += 1
325		else:
	376	elif (args.prev_m89res):
326	377	this_file_out = args.prev_m89res
327
	378	elif (args.prev_msa):
	379	# build a PSSM, do a search, up the iteration count
	380	prev_pssm = pssm_from_msa(query_file, prev_msa, args.error_log)
	381	search_str = srch_subs[srch_pgm](args.query_file, args.db_file, args.prev_pssm, args)
	382	if (not args.use_stdout):
	383	log_system(search_str + "> " + this_file_out + " 2> " + this_file_out + ".err", args.error_log);
	384	else:
	385	log_system(search_str + " 2> " + this_file_out + ".err");
	386
	387	del_err_files.append(this_file_out+".err")
	388	first_iter += 1
328	389
329	390	it=first_iter
330	391

332	393
333	394	while (it < args.num_iter) :
334	395
335		(this_pssm, this_bound_out) = build_msa_pssm(args.query_file, this_file_out, prev_bound_in, arg.prev_sel_res, error_log)
	396	(this_pssm, this_bound_out) = build_msa_pssm(args.query_file, this_file_out, prev_bound_in, args.prev_sel_res, args.error_log)
336	397	prev_file_out = this_file_out
337		arg.prev_sel_res = ''
	398	args.prev_sel_res = ''
338	399
339	400	iter_val = this_iter + it
340	401

347	408	if (args.tmp_dir) :
348	409	this_file_out = args.tmp_dir+"/"+this_file_out
349	410
350		search_str = srch_subs[srch_pgm](args.query_file, args.db_file, prev_pssm)
351		log_system("%s > %s 2> %s" % (search_str,this_file_out,this_file_out+".err"), error_log)
	411	search_str = srch_subs[srch_pgm](args.query_file, args.db_file, prev_pssm, args)
	412	log_system("%s > %s 2> %s" % (search_str,this_file_out,this_file_out+".err"), args.error_log)
352	413	del_err_files.append(this_file_out+".err")
353	414
354	415	if (len(del_file_ext)):
355	416	del_file_list = [ prev_file_out+'.'+ext for ext in del_file_ext]
356		log_system('rm '+' '.join(del_file_list),error_log)
	417	log_system('rm '+' '.join(del_file_list),args.error_log)
357	418
358	419	if (has_converged(prev_bound_in, this_bound_out)) :
359	420	if (not quiet) :

361	422
362	423	# if (len(del_file_ext)):
363	424	# del_file_list = [ prev_file_out+'.'+ext for ext in del_file_ext]
364		# log_system('rm '+' '.join(del_file_list),error_log)
	425	# log_system('rm '+' '.join(del_file_list),args.error_log)
365	426
366	427	if (delete_bnd) :
367		log_system("rm "+prev_bound_in,error_log)
	428	log_system("rm "+prev_bound_in,args.error_log)
368	429
369	430	exit(0)
370	431
371	432	if (delete_bnd) :
372		log_system("rm "+prev_bound_in,error_log)
	433	log_system("rm "+prev_bound_in,args.error_log)
373	434	prev_bound_in = this_bound_out
374	435
375	436	it += 1
376	437
377	438	if (len(del_err_files)):
378		log_system('rm '+' '.join(del_err_files),error_log)
	439	log_system('rm '+' '.join(del_err_files),args.error_log)
379	440
380	441	# if (len(del_file_ext)):
381	442	# del_file_list = [ prev_file_out+'.'+ext for ext in del_file_ext]
382		# log_system('rm '+' '.join(del_file_list),error_log)
	443	# log_system('rm '+' '.join(del_file_list),args.error_log)
383	444
384	445	if (delete_bnd):
385		log_system("rm "+this_bound_out,error_log)
	446	log_system("rm "+this_bound_out,args.error_log)
386	447
387	448	if (not quiet) :
388	449	sys.stderr.write(" %s %s %s %s finished (%d iterations)\n" % (sys.argv[0], srch_pgm, query_file, args.db_file, it))

+36

-0

psisearch2/psisearch2_msa_iter.sh less more

	0	#!/bin/sh
	1
	2	################
	3	# example that runs psisearch2_msa.pl iteratively through 5 iterations.
	4	# Equivalent to:
	5	# psisearch2_msa.pl --query CL0238_emb.fa --num_iter 5 --db /slib2/fa_dbs/rpd3_pfam28_lib.lseg
	6	#
	7
	8
	9	PS_BIN=~/Devel/fa36_v3.8/psisearch2
	10	Q_DIR="../seq"
	11	FA_DB=/slib2/fa_dbs/qfo78.lseg
	12	BL_DB=/slib2/bl_dbs/qfo78
	13	DB=$FA_DB
	14
	15	OUT_SUFF='qm8CB'
	16
	17	M_FORMAT='m8CB'
	18	ITERS='2 3 4 5'
	19
	20	for q_file_p in $*; do
	21
	22	q_file=${q_file_p##*/}
	23	echo $q_file
	24
	25	# iteration 1:
	26
	27	$PS_BIN/psisearch2_msa.pl --query $Q_DIR/$q_file --num_iter 1 --db $DB --int_mask query --end_mask query --out_suffix $OUT_SUFF --m_format $M_FORMAT
	28
	29	# iteration 2 - 5
	30	for it in $ITERS; do
	31	prev=$(($it-1))
	32	$PS_BIN/psisearch2_msa.pl --query $Q_DIR/$q_file --num_iter 1 --db $DB --int_mask query --end_mask query --out_suffix $OUT_SUFF --this_iter $it --prev_m89res $q_file.it${prev}.$OUT_SUFF --m_format $M_FORMAT
	33	done
	34
	35	done

+31

-0

psisearch2/psisearch2_msa_iter_bl.sh less more

	0	#!/bin/sh
	1
	2	################
	3	# example that runs psisearch2_msa.pl iteratively through 5 iterations using psiblast instead of ssearch
	4	# Equivalent to:
	5	# psisearch2_msa.pl --pgm psiblast --query query.aa --num_iter 5 --db /slib2/bl_dbs/qfo78
	6	#
	7
	8	PS_BIN=~/Devel/fa36_v3.8/psisearch2
	9	q_file=$1
	10	m_format='m8CB'
	11	SRC_QDIR=../hum_1dom200_queries
	12
	13	iters='2 3 4 5'
	14	# iters=''
	15
	16	for q_file_p in $*; do
	17
	18	q_file=${q_file_p##*/}
	19	echo $q_file
	20
	21	# iteration 1:
	22	# echo "$PS_BIN/psisearch2_msa.pl --pgm psiblast --query $SRC_QDIR/$q_file --num_iter 1 --db /slib2/bl_dbs/qfo78 --int_mask query --end_mask query --out_suffix q_pblt --m_format $m_format --save_list asnbin"
	23	$PS_BIN/psisearch2_msa.pl --pgm psiblast --query $SRC_QDIR/$q_file --num_iter 1 --db /slib2/bl_dbs/qfo78 --int_mask query --end_mask query --out_suffix q_pblt --m_format $m_format --save_list asntxt
	24
	25	# iteration 2 - 5
	26	for it in $iters; do
	27	prev=$(($it-1))
	28	$PS_BIN/psisearch2_msa.pl --pgm psiblast --query $SRC_QDIR/$q_file --num_iter 1 --db /slib2/bl_dbs/qfo78 --int_mask query --end_mask query --out_suffix q_pblt --this_iter $it --prev_m89res $q_file.it${prev}.q_pblt --m_format $m_format --save_list asntxt
	29	done
	30	done

+37

-5

scripts/README less more

0	0
1	1	22-Jan-2014
2	2	13-Apr-2016 updated
	3	22-Feb-2019 updated
3	4
4	5	fasta36/scripts
5	6
6	7	Perl scripts for annotating sequences and expanding libraries
	8
	9	-- Sequence generation (January, February, 2019)
	10
	11	The FASTA programs can now use sequences that are downloaded from
	12	Uniprot or NCBI/RefSeq (or otherwise provided by a program script that
	13	produces FASTA sequences from an identifier) by specifying the name of
	14	the script, the accession(s), and library type 9, e.g.
	15
	16	fasta36 \!../scripts/get_protein.py+P09488 /seqlib/swissprot.fasta
	17
	18	Scripts are available for downloading protein sequences from Uniprot
	19	or RefSeq (get_protein.py), Uniprot (get_uniprot.py), and for
	20	downloading either protein or mRNA sequences from RefSeq
	21	(get_refseq.py).
	22
	23	scripts/get_protein.py get Refseq or Uniprot proteins
	24	scripts/get_refseq.py get RefSeq proteins or mRNAs
	25	scripts/get_up_prot_iso_sql.py get a protein and its isoforms using a mysql database
	26	scripts/get_genome_seq.py get human genome (hg38) or mouse (mm10) --genome mm10 sequences using bedtools using "get_genome_seq.py chr1:123456-126543"
7	27
8	28	-- Sequence alignment scoring/annotation
9	29

82	102	ann_pdb_cath.pl -- generate CATH domains using PDB accessions from a mySQL database
83	103	ann_pdb_vast.pl -- use VAST domains, but domain names are not informative
84	104
85		ann_pfam27.pl -- generate Pfam domains using local Pfam mySQL database (Pfam27 with auto_pfamA, auto_pfamseq)
86		ann_pfam28.pl -- generate Pfam domains using local Pfam mySQL database (Pfam28, no auto_pfamA, auto_pfamseq)
	105	ann_pfam28.pl -- generate Pfam domains using local Pfam mySQL database
	106	(Pfam28, no auto_pfamA, auto_pfamseq)
	107
87	108	ann_pfam_www.pl -- use Pfam Website, and XML::Twig, to get Pfam domain info.
88	109
89		ann_exons_ens.pl -- generate exon boundaries on SwissProt proteins from Ensembl.
90		ann_exons_up_www.pl -- generate exon boundaries on SwissProt proteins using the EBI/Proteins/API/coordinate service
	110	ann_exons_up_www.pl -- generate exon boundaries on Uniprot proteins
	111	using the EBI/Proteins/API/coordinate service
	112
	113	ann_exons_up_sql_www.pl -- generate exon boundaries on Uniprot
	114	proteins using an SQL database (if available) or the EBI/Proteins
	115	coordinate service. The SQL results are dramatically faster.
	116
91	117	ann_exons_ncbi.pl -- generate exon boundaries on NCBI refseq proteins.
92	118
93	119	-- Library expansion
94	120
	121	expand_up_isoforms.pl -- for Uniprot reference proteomes, provide
	122	isoforms for each canonical sequence.
	123
95	124	expand_uniref50.pl -- allows search of uniref50 to be expanded
96		expand_links.pl -- script to take hits from a smaller library and expand to complete library
	125
	126	expand_links.pl -- script to take hits from a smaller library and
	127	expand to complete library
	128
97	129	links2sql.pl -- create links for expand_links.pl
98	130
99	131	exp_up_ensg.pl -- expand uniprot sequences to include Ensembl splice variants

+453

-0

scripts/ann_exons_all.pl less more

	0	#!/usr/bin/env perl
	1
	2	################################################################
	3	# copyright (c) 2014,2015 by William R. Pearson and The Rector &
	4	# Visitors of the University of Virginia
	5	################################################################
	6	# Licensed under the Apache License, Version 2.0 (the "License");
	7	# you may not use this file except in compliance with the License.
	8	# You may obtain a copy of the License at
	9	#
	10	# http://www.apache.org/licenses/LICENSE-2.0
	11	#
	12	# Unless required by applicable law or agreed to in writing,
	13	# software distributed under this License is distributed on an "AS
	14	# IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either
	15	# express or implied. See the License for the specific language
	16	# governing permissions and limitations under the License.
	17	################################################################
	18
	19	# ann_exons_up_sql.pl gets an annotation file from fasta36 -V with a line of the form:
	20
	21	# gi\|62822551\|sp\|P00502\|GSTA1_RAT Glutathione S-transfer\n (at least from pir1.lseg)
	22	#
	23	# it must:
	24	# (1) read in the line
	25	# (2) parse it to get the up_acc
	26	# (3) return the tab delimited features
	27
	28	# this version can read feature2 uniprot features (acc/pos/end/label/value), but returns sorted start/end domains
	29	# modified 18-Jan-2016 to produce annotation symbols consistent with ann_exons_up_www2.pl
	30	# modified Dec 2018 to generate genomic coordinates with --gen_coord
	31	# modified 3-Jan-2019 to merge sql and www (--www) access to exon coordinates
	32
	33	use warnings;
	34	use strict;
	35
	36	use DBI;
	37	use Getopt::Long;
	38	use Pod::Usage;
	39	use LWP::Simple;
	40	use LWP::UserAgent;
	41	use JSON qw(decode_json);
	42
	43	use vars qw($host $db $a_table $port $user $pass);
	44
	45	my %domains = ();
	46	my $domain_cnt = 0;
	47
	48	my $hostname = `/bin/hostname`;
	49
	50	unless ($hostname =~ m/ebi/) {
	51	($host, $db, $a_table, $port, $user, $pass) = ("wrpxdb.its.virginia.edu", "uniprot", "annot2", 0, "web_user", "fasta_www");
	52	# $host = 'xdb';
	53	}
	54	else {
	55	($host, $db, $a_table, $port, $user, $pass) = ("mysql-pearson-prod", "up_db", "annot", 4124, "web_user", "fasta_www");
	56	}
	57
	58	my ($lav, $gen_coord, $exon_label, $use_www, $shelp, $help) = (0,0,0,0,0,0);
	59
	60	my ($show_color) = (1);
	61	my $color_sep_str = " :";
	62	$color_sep_str = '~';
	63
	64	GetOptions(
	65	"gen_coord\|gene_coord!" => \$gen_coord,
	66	"exon_label\|label_exons!" => \$exon_label,
	67	"www!" => \$use_www,
	68	"host=s" => \$host,
	69	"db=s" => \$db,
	70	"user=s" => \$user,
	71	"password=s" => \$pass,
	72	"port=i" => \$port,
	73	"lav" => \$lav,
	74	"h\|?" => \$shelp,
	75	"help" => \$help,
	76	);
	77
	78	pod2usage(1) if $shelp;
	79	pod2usage(exitstatus => 0, verbose => 2) if $help;
	80	pod2usage(1) unless (@ARGV \|\| -p STDIN \|\| -f STDIN);
	81
	82	my $connect = "dbi:mysql(AutoCommit=>1,RaiseError=>1):database=$db";
	83	$connect .= ";host=$host" if $host;
	84	$connect .= ";port=$port" if $port;
	85
	86	my $dbh = DBI->connect($connect,
	87	$user,
	88	$pass
	89	) or die $DBI::errstr;
	90
	91
	92	my $get_annot_sub = \&get_annots;
	93
	94
	95	my $ua = LWP::UserAgent->new(ssl_opts=>{verify_hostname => 0});
	96	my $uniprot_url = 'https://www.ebi.ac.uk/proteins/api/coordinates/';
	97	my $uniprot_suff = ".json";
	98
	99
	100	if ($use_www) {
	101	$get_annot_sub = \&get_annots_up_www;
	102	}
	103
	104
	105	my $get_annots_id = $dbh->prepare(qq(select up_exons.* from up_exons join annot2 using(acc) where id=? order by ix));
	106	my $get_annots_acc = $dbh->prepare(qq(select up_exons.* from up_exons where acc=? order by ix));
	107	my $get_annots_refacc = $dbh->prepare(qq(select ref_acc, start, end, ix from up_exons join annot2 using(acc) where ref_acc=? order by ix));
	108	my $get_annots_refseq = $dbh->prepare(qq(select acc, ex_p_start as start, ex_p_end as end, ex_num as ix, chrom, g_start, g_end from seqdb_demo2.ref_exons where acc=? order by ix));
	109
	110	my $get_annots_sql = $get_annots_acc;
	111
	112	my ($tmp, $gi, $sdb, $acc, $id, $use_acc);
	113
	114	# get the query
	115	my ($query, $seq_len) = @ARGV;
	116	$seq_len = 0 unless defined($seq_len);
	117
	118	$query =~ s/^>// if ($query);
	119
	120	my @annots = ();
	121
	122	#if it's a file I can open, read and parse it
	123	unless ($query && ($query =~ m/[\\|:]/ \|\|
	124	$query =~ m/^[NX]P_/ \|\|
	125	$query =~ m/^[OPQ][0-9][A-Z0-9]{3}[0-9]\|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}\s/)) {
	126
	127	while (my $a_line = <>) {
	128	$a_line =~ s/^>//;
	129	chomp $a_line;
	130	push @annots, show_annots($a_line, $get_annot_sub, $use_www);
	131	}
	132	}
	133	else {
	134	push @annots, show_annots("$query\t$seq_len", $get_annot_sub, $use_www);
	135	}
	136
	137	for my $seq_annot (@annots) {
	138	print ">",$seq_annot->{seq_info},"\n";
	139	for my $annot (@{$seq_annot->{list}}) {
	140	if (!$lav && $show_color && defined($domains{$annot->[-1]})) {
	141	$annot->[-1] .= $color_sep_str.$domains{$annot->[-1]};
	142	}
	143	print join("\t",@$annot),"\n";
	144	}
	145	}
	146
	147	exit(0);
	148
	149	sub show_annots {
	150	my ($query_len, $get_annot_sub, $use_www) = @_;
	151
	152	my ($annot_line, $seq_len) = split(/\t/,$query_len);
	153
	154	my %annot_data = (seq_info=>$annot_line);
	155
	156	if ($annot_line =~ m/^gi\\|/) {
	157	$use_acc = 1;
	158	($tmp, $gi, $sdb, $acc, $id) = split(/\\|/,$annot_line);
	159	}
	160	elsif ($annot_line =~ m/^(SP\|TR):(\w+) (\w+)/) {
	161	($sdb, $id, $acc) = ($1,$2,$3);
	162	$use_acc = 1;
	163	$sdb = lc($sdb)
	164	}
	165	elsif ($annot_line =~ m/^(SP\|TR):(\w+)/) {
	166	($sdb, $id) = ($1,$2);
	167	$use_acc = 0;
	168	$sdb = lc($sdb)
	169	}
	170	elsif ($annot_line !~ m/\\|/) { # new NCBI swissprot format
	171	$use_acc =1;
	172	if ($annot_line =~ m/[NXY]P_\d+/) {
	173	$sdb = 'ref';
	174	}
	175	else {
	176	$sdb = 'sp';
	177	}
	178	($acc) = split(/\s+/,$annot_line);
	179	}
	180	else {
	181	$use_acc = 1;
	182	($sdb, $acc, $id) = split(/\\|/,$annot_line);
	183	}
	184
	185	unless ($use_acc) {
	186	$get_annots_sql = $get_annots_id;
	187	$get_annots_sql->execute($id);
	188	}
	189	else {
	190	if ($sdb =~ m/ref/) {
	191	$get_annots_sql = $get_annots_refseq;
	192	} else {
	193	$get_annots_sql = $get_annots_acc;
	194	}
	195	$acc =~ s/\.\d+$//;
	196
	197	unless ($use_www) {
	198	$get_annots_sql->execute($acc);
	199	}
	200	else {
	201	$get_annots_sql = $acc;
	202	}
	203	}
	204
	205	$annot_data{list} = $get_annot_sub->($get_annots_sql, $seq_len);
	206
	207	return \%annot_data;
	208	}
	209
	210	sub get_annots {
	211	my ($get_annots_sql, $seq_len) = @_;
	212
	213	my @feats = ();
	214
	215	while (my $exon_hr = $get_annots_sql->fetchrow_hashref()) {
	216	my $ix = $exon_hr->{ix};
	217	if ($lav) {
	218	push @feats, [$exon_hr->{start}, $exon_hr->{end}, "exon_$ix~$ix"];
	219	} else {
	220	my ($exon_info,$ex_info_start, $ex_info_end) = ("exon_$ix~$ix","","");
	221	if ($gen_coord) {
	222	if (defined($exon_hr->{g_start})) {
	223	my $chr=$exon_hr->{chrom};
	224	$chr = "unk" unless $chr;
	225	if ($chr =~ m/^\d+$/ \|\| $chr =~m/^[XYZ]+$/) {
	226	$chr = "chr$chr";
	227	}
	228	$ex_info_start = sprintf("exon_%d::%s:%d",$ix, $chr, $exon_hr->{g_start});
	229	$ex_info_end = sprintf("exon_%d::%s:%d",$ix, $chr, $exon_hr->{g_end});
	230	if ($exon_label) {
	231	$exon_info = sprintf("exon_%d{%s:%d-%d}~%d",$ix, $chr, $exon_hr->{g_start}, $exon_hr->{g_end}, $ix);
	232	push @feats, [$exon_hr->{start}, "-", $exon_hr->{end}, $exon_info];
	233	} else {
	234	push @feats, [$exon_hr->{start}, "-", $exon_hr->{end}, $exon_info];
	235	push @feats, [$exon_hr->{start},'<','-',$ex_info_start];
	236	push @feats, [$exon_hr->{end},'>','-',$ex_info_end];
	237	}
	238	}
	239	} else {
	240	push @feats, [$exon_hr->{start}, "-", $exon_hr->{end}, $exon_info];
	241	}
	242	}
	243	}
	244
	245	return \@feats;
	246	}
	247
	248	sub get_annots_up_www {
	249	my ($acc, $seq_len) = @_;
	250
	251	my @feats = ();
	252
	253	# my $exon_json = get_https($uniprot_url.$acc.$uniprot_suff);
	254	my $exon_json = get($uniprot_url.$acc.$uniprot_suff);
	255
	256	unless (!$exon_json \|\| $exon_json =~ m/errorMessage/ \|\| $exon_json =~ m/Can not find/) {
	257	return parse_json_up_exons($exon_json);
	258	}
	259	else {
	260	return ();
	261	}
	262	}
	263
	264	sub parse_json_up_exons {
	265	my ($exon_json) = @_;
	266
	267	my @exons = ();
	268	my @ex_coords = ();
	269
	270	my $acc_exons = decode_json($exon_json);
	271
	272	my $exon_num = 1;
	273	my $last_end = 0;
	274	my $last_phase = 0;
	275
	276	my $chrom = $acc_exons->{'gnCoordinate'}[0]{'genomicLocation'}{'chromosome'};
	277	my $rev_strand = $acc_exons->{'gnCoordinate'}[0]{'genomicLocation'}{'reverseStrand'};
	278
	279	for my $exon ( @{$acc_exons->{'gnCoordinate'}[0]{'genomicLocation'}{'exon'}} ) {
	280	my ($p_begin, $p_end) = ($exon->{'proteinLocation'}{'begin'}{'position'},$exon->{'proteinLocation'}{'end'}{'position'});
	281	my ($g_begin, $g_end) = ($exon->{'genomeLocation'}{'begin'}{'position'},$exon->{'genomeLocation'}{'end'}{'position'});
	282
	283	my $this_phase = 0;
	284	if (defined($g_begin) && defined($g_end)) {
	285	$this_phase = ($g_end - $g_begin + 1) % 3;
	286	}
	287
	288	if (!defined($p_begin) \|\| !defined($p_end)) {
	289	$exon_num++;
	290	$last_phase = 0;
	291	next;
	292	}
	293
	294	if ($p_end >= $p_begin) {
	295	if ($p_begin == $last_end) {
	296	if ($last_phase==2) {
	297	$p_begin += 1;
	298	}
	299	elsif ($last_phase==1) {
	300	$last_end -= 1;
	301	$exons[-1]->{seq_end} -= 1;
	302	}
	303	}
	304
	305	if ($p_begin <= $last_end && $p_end > $last_end) {
	306	$p_begin = $last_end+1;
	307	}
	308	$last_end = $p_end;
	309	$last_phase = $this_phase;
	310
	311	my ($gs_begin, $gs_end) = ($g_begin, $g_end);
	312	if ($rev_strand) {
	313	($gs_begin, $gs_end) = ($g_end, $g_begin);
	314	}
	315
	316	push @exons, {
	317	ix=>$exon_num,
	318	start=>$p_begin,
	319	end=>$p_end,
	320	g_start=>$gs_begin,
	321	g_end=>$gs_end,
	322	chrom=>$chrom,
	323	};
	324
	325	$exon_num++;
	326	}
	327	}
	328
	329	# check for domain overlap, and resolve check for domain overlap
	330	# (possibly more than 2 domains), choosing the domain with the best
	331	# evalue
	332
	333	my @ex_feats = ();
	334
	335	for my $exon_hr (@exons) {
	336	my $ix = $exon_hr->{ix};
	337	if ($lav) {
	338	push @ex_feats, [$exon_hr->{start}, $exon_hr->{end}, "exon_$ix~$ix" ];
	339	}
	340	else {
	341	my ($exon_info,$ex_info_start, $ex_info_end) = ("exon_$ix~$ix","","");
	342	if ($gen_coord) {
	343	if (defined($exon_hr->{g_start})) {
	344	my $chr=$exon_hr->{chrom};
	345	$chr = "unk" unless $chr;
	346	if ($chr =~ m/^\d+$/ \|\| $chr =~m/^[XYZ]+$/) {
	347	$chr = "chr$chr";
	348	}
	349	$ex_info_start = sprintf("exon_%d::%s:%d",$ix, $chr, $exon_hr->{g_start});
	350	$ex_info_end = sprintf("exon_%d::%s:%d",$ix, $chr, $exon_hr->{g_end});
	351	if ($exon_label) {
	352	$exon_info = sprintf("exon_%d{%s:%d-%d}~%d",$ix, $chr, $exon_hr->{g_start}, $exon_hr->{g_end},$ix);
	353	push @ex_feats, [$exon_hr->{start}, "-", $exon_hr->{end}, $exon_info];
	354	} else {
	355	push @ex_feats, [$exon_hr->{start}, "-", $exon_hr->{end}, $exon_info];
	356	push @ex_feats, [$exon_hr->{start},'<','-',$ex_info_start];
	357	push @ex_feats, [$exon_hr->{end},'>','-',$ex_info_end];
	358	}
	359	}
	360	} else {
	361	push @ex_feats, [$exon_hr->{start}, "-", $exon_hr->{end}, $exon_info];
	362	}
	363	}
	364	}
	365	return \@ex_feats;
	366	}
	367
	368	sub get_https {
	369	my ($url) = @_;
	370
	371	my $result = "";
	372	my $response = $ua->get($url);
	373
	374	if ($response->is_success) {
	375	$result = $response->decoded_content;
	376	} else {
	377	$result = '';
	378	}
	379	return $result;
	380	}
	381
	382
	383
	384	__END__
	385
	386	=pod
	387
	388	=head1 NAME
	389
	390	ann_exons_up_sql.pl
	391
	392	=head1 SYNOPSIS
	393
	394	ann_exons_up_sql.pl --lav 'sp\|P09488\|GSTM1_NUMAN' \| accession.file
	395
	396	=head1 OPTIONS
	397
	398	-h short help
	399	--help include description
	400	--gen_coord -- provide genomic exon start/stop coordinates as features
	401	--lav produce lav2plt.pl annotation format, only show domains/repeats
	402	--host, --user, --password, --port --db -- info for mysql database
	403
	404	=head1 DESCRIPTION
	405
	406	C<ann_exons_all.pl> extracts exon location information from msyql
	407	databases (uniprot for Uniprot proteins, seqdb_demo2 for refseq) built
	408	from EBI/proteins API data (Uniprot) or Refseq GFF data (refseq).
	409
	410	Given a command line argument that contains a sequence accession
	411	(P09488) or identifier (GSTM1_HUMAN), the program looks up the
	412	features available for that sequence and returns them in a
	413	tab-delimited format:
	414
	415	>sp\|P09488\|GSTM1_HUMAN
	416	1 - 12 exon_1~1
	417	13 - 38 exon_2~2
	418	39 - 59 exon_3~3
	419	60 - 87 exon_4~4
	420	88 - 120 exon_5~5
	421	121 - 152 exon_6~6
	422	153 - 189 exon_7~7
	423	190 - 218 exon_8~8
	424
	425	C<ann_exons_all.pl --gen_coord 'sp\|P09488\|GSTM1_HUMAN'>also provides genomic coordinates:
	426
	427	>sp\|P09488\|GSTM1_HUMAN
	428	1 - 12 exon_1~1
	429	1 < - exon_1::chr1:109687874
	430	12 > - exon_1::chr1:109687909
	431	13 - 37 exon_2~2
	432	13 < - exon_2::chr1:109688170
	433	37 > - exon_2::chr1:109688245
	434	38 - 59 exon_3~3
	435	38 < - exon_3::chr1:109688673
	436	59 > - exon_3::chr1:109688737
	437	...
	438	190 - 218 exon_8~8
	439	190 < - exon_8::chr1:109693206
	440	218 > - exon_8::chr1:109693292
	441
	442	C<ann_exons_all.pl> is designed to be used by the B<FASTA> programs
	443	with the C<-V \!ann_exons_all.pl> option, or by the
	444	C<annot_blast_btop.pl> script. It can also be used with the
	445	lav2plt.pl program with the C<--xA "\!ann_exons_all.pl --lav"> or
	446	C<--yA "\!ann_exons_all.pl --lav"> options.
	447
	448	=head1 AUTHOR
	449
	450	William R. Pearson, wrp@virginia.edu
	451
	452	=cut

+2

-1

scripts/ann_exons_ens.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014,2015 by William R. Pearson and The Rector &

28	28	# (3) return the tab delimited exon boundaries
29	29
30	30
	31	use warnings;
31	32	use strict;
32	33
33	34	use DBI;

+23

-54

scripts/ann_exons_ncbi.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	# ann_exons_ncbi.pl gets an annotation file from fasta36 -V with a line of the form:
3	3
4		# gi\|23065544\|ref\|NP_000552.2\|
	4	# gi\|23065544\|ref\|NP_000552.2\| or
	5	# NP_000552
5	6	#
6	7	# and returns the exons present in the protein from NCBI gff3 tables (human, mouse, rat, xtrop)
7	8	#

11	12	# (3) return the tab delimited exon boundaries
12	13	#
13	14
	15	use warnings;
14	16	use strict;
15	17
16	18	use DBI;

23	25
24	26	($host, $db, $port, $user, $pass) = ("wrpxdb.its.virginia.edu", "seqdb_demo2", 0, "web_user", "fasta_www");
25	27
26		my ($auto_reg,$rpd2_fams, $neg_doms, $lav, $no_doms, $pf_acc, $shelp, $help) = (0, 0, 0, 0,0, 0,0,0);
27		my ($min_nodom) = (10);
	28	my ($lav, $shelp, $help) = (0, 0, 0);
28	29
29	30	my $color_sep_str = " :";
30	31	$color_sep_str = '~';

36	37	"password=s" => \$pass,
37	38	"port=i" => \$port,
38	39	"lav" => \$lav,
39		"neg" => \$neg_doms,
40		"neg_doms" => \$neg_doms,
41		"neg-doms" => \$neg_doms,
42		"min_nodom=i" => \$min_nodom,
43		"pfacc" => \$pf_acc,
44		"RPD2" => \$rpd2_fams,
45		"auto_reg" => \$auto_reg,
46	40	"h\|?" => \$shelp,
47	41	"help" => \$help,
48	42	);

130	124	elsif ($annot_line =~ m/^ref\\|/) {
131	125	($sdb, $acc) = split(/\\|/,$annot_line);
132	126	}
	127	else {
	128	$acc = $annot_line;
	129	}
133	130
134	131	$acc =~ s/\.\d+$//;
135	132	$get_annots_sql->execute($acc);

147	144	# get the list of domains, sorted by start
148	145	while ( my $row_href = $get_annots->fetchrow_hashref()) {
149	146
150		$row_href->{info} = "exon_".$row_href->{ex_num};
	147	$row_href->{info} = "exon_".$row_href->{ex_num}.$color_sep_str.$row_href->{ex_num};
151	148	push @exons, $row_href
152	149	}
153	150

171	168	return \@feats;
172	169	}
173	170
174		# domain name takes a uniprot domain label, removes comments ( ;
175		# truncated) and numbers and returns a canonical form. Thus:
176		# Cortactin 6.
177		# Cortactin 7; truncated.
178		# becomes "Cortactin"
179		#
180
181		sub domain_name {
182
183		my ($value) = @_;
184
185		if (!defined($domains{$value})) {
186		$domain_cnt++;
187		$domains{$value} = $domain_cnt;
188		}
189		return $value;
190		}
191
192	171	__END__
193	172
194	173	=pod
195	174
196	175	=head1 NAME
197	176
198		ann_feats.pl
	177	ann_exons_ncbi.pl
199	178
200	179	=head1 SYNOPSIS
201	180
202		ann_pfam.pl --neg-doms 'sp\|P09488\|GSTM1_NUMAN' \| accession.file
	181	ann_exons_ncbi.pl NP_000552
203	182
204	183	=head1 OPTIONS
205	184
206	185	-h short help
207	186	--help include description
208		--neg-doms, -- report domains between annotated domains as NODOM
209		(also --neg, --neg_doms)
210		--min_nodom=10 -- minimum length between domains for NODOM
211
	187	--lav produce lav2plt.pl annotation format, only show domains/repeats
212	188	--host, --user, --password, --port --db -- info for mysql database
213	189
214	190	=head1 DESCRIPTION
215	191
216		C<ann_pfam.pl> extracts domain information from a msyql
	192	C<ann_exons_ncbi.pl> extracts domain information from a msyql
217	193	database. Currently, the program works with database sequence
218	194	descriptions in one of two formats:
219	195
220		>pf26\|649\|O94823\|AT10B_HUMAN -- RPD2_seqs
221
222		(pf26 databases have auto_pfamseq in the second field) and
223
224		>gi\|1705556\|sp\|P54670.1\|CAF1_DICDI
225
226		C<ann_pfam.pl> uses the C<pfamA_reg_full_significant>, C<pfamseq>,
227		and C<pfamA> tables of the C<pfam> database to extract domain
228		information on a protein. For proteins that have multiple domains
229		associated with the same overlapping region (domains overlap by more
230		than 1/3 of the domain length), C<auto_pfam.pl> selects the domain
231		annotation with the best C<domain_evalue_score>. When domains overlap
232		by less than 1/3 of the domain length, they are shortened to remove
233		the overlap.
234
235		C<ann_pfam.pl> is designed to be used by the B<FASTA> programs with
236		the C<-V \!ann_pfam.pl> or C<-V "\!ann_pfam.pl --neg"> option.
	196	>gi\|23065544\|ref\|NP_000552.2\| or
	197	>NP_000552
	198
	199	C<ann_exons_ncbi.pl> uses the C<ref_exons> table of the C<seqdb2>
	200	database to extract exon position information on a protein. The
	201	C<seqdb2/ref_exons> table is constructed from refseq gff files using
	202	the C<ncbi_refseq_ex2prot.pl> script.
	203
	204	C<ann_exons_ncbi.pl> is designed to be used by the B<FASTA> programs with
	205	the C<-V \!ann_exons_ncbi.pl> option.
237	206
238	207	=head1 AUTHOR
239	208

+66

-121

scripts/ann_exons_up_sql.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014,2015 by William R. Pearson and The Rector &

24	24	# (1) read in the line
25	25	# (2) parse it to get the up_acc
26	26	# (3) return the tab delimited features
27		#
28	27
29	28	# this version can read feature2 uniprot features (acc/pos/end/label/value), but returns sorted start/end domains
30	29	# modified 18-Jan-2016 to produce annotation symbols consistent with ann_exons_up_www2.pl
31	30
	31	use warnings;
32	32	use strict;
33	33
34	34	use DBI;

50	50	($host, $db, $a_table, $port, $user, $pass) = ("mysql-pearson-prod", "up_db", "annot", 4124, "web_user", "fasta_www");
51	51	}
52	52
53		my ($sstr, $lav, $neg_doms, $no_vars, $no_doms, $no_feats, $shelp, $help, $pfam26) = (0,0,0,0,0,0,0,0,0,0);
54		my ($min_nodom) = (10);
	53	my ($lav, $gen_coord, $shelp, $help) = (0,0,0,0);
55	54
56	55	my ($show_color) = (1);
57	56	my $color_sep_str = " :";
58	57	$color_sep_str = '~';
59	58
60	59	GetOptions(
	60	"gen_coord!" => \$gen_coord,
61	61	"host=s" => \$host,
62	62	"db=s" => \$db,
63	63	"user=s" => \$user,
64	64	"password=s" => \$pass,
65	65	"port=i" => \$port,
66	66	"lav" => \$lav,
67		"no_doms" => \$no_doms,
68		"no-doms" => \$no_doms,
69		"nodoms" => \$no_doms,
70		"no_var" => \$no_vars,
71		"no-var" => \$no_vars,
72		"novar" => \$no_vars,
73		"neg" => \$neg_doms,
74		"neg_doms" => \$neg_doms,
75		"neg-doms" => \$neg_doms,
76		"negdoms" => \$neg_doms,
77		"min_nodom=i" => \$min_nodom,
78		"min-nodom=i" => \$min_nodom,
79		"no_feats" => \$no_feats,
80		"no-feats" => \$no_feats,
81		"nofeats" => \$no_feats,
82		"color!" => \$show_color,
83		"sstr" => \$sstr,
84	67	"h\|?" => \$shelp,
85	68	"help" => \$help,
86	69	);

99	82	) or die $DBI::errstr;
100	83
101	84
102		my $get_annot_sub = \&get_fasta_annots;
103		if ($lav) {
104		$no_feats = 1;
105		$get_annot_sub = \&get_lav_annots;
106		}
107
108		my $get_annots_id = $dbh->prepare(qq(select acc, start, end, ix from up_exons join annot2 using(acc) where id=? order by ix));
109		my $get_annots_acc = $dbh->prepare(qq(select acc, start, end, ix from up_exons where acc=? order by ix));
	85	my $get_annot_sub = \&get_annots;
	86
	87	my $get_annots_id = $dbh->prepare(qq(select up_exons.* from up_exons join annot2 using(acc) where id=? order by ix));
	88	my $get_annots_acc = $dbh->prepare(qq(select up_exons.* from up_exons where acc=? order by ix));
110	89	my $get_annots_refacc = $dbh->prepare(qq(select ref_acc, start, end, ix from up_exons join annot2 using(acc) where ref_acc=? order by ix));
111	90
112	91	my $get_annots_sql = $get_annots_acc;

199	178	return \%annot_data;
200	179	}
201	180
202		sub get_fasta_annots {
	181	sub get_annots {
203	182	my ($get_annots_sql, $seq_len) = @_;
204	183
205		my ($acc, $start, $end, $ix);
206	184	my @feats = ();
207	185
208		while (($acc, $start, $end, $ix) = $get_annots_sql->fetchrow_array()) {
209		push @feats, [$start, "-", $end, "exon_$ix~$ix"];
	186	while (my $exon_hr = $get_annots_sql->fetchrow_hashref()) {
	187	my $ix = $exon_hr->{ix};
	188	if ($lav) {
	189	push @feats, [$exon_hr->{start}, $exon_hr->{end}, "exon_$ix~$ix"];
	190	}
	191	else {
	192	push @feats, [$exon_hr->{start}, "-", $exon_hr->{end}, "exon_$ix~$ix"];
	193	if ($gen_coord) {
	194	if (not defined($exon_hr->{g_start})) {
	195	next;
	196	}
	197
	198	my $chr=$exon_hr->{chrom};
	199	$chr = "unk" unless $chr;
	200	if ($chr =~ m/^\d+$/ \|\| $chr =~m/^[XYZ]+$/) {
	201	$chr = "chr$chr";
	202	}
	203	my $ex_info = sprintf("exon_%d::%s:%d",$ix, $chr, $exon_hr->{g_start});
	204	push @feats, [$exon_hr->{start},'<','-',$ex_info];
	205	$ex_info = sprintf("exon_%d::%s:%d",$ix, $chr, $exon_hr->{g_end});
	206	push @feats, [$exon_hr->{end},'>','-',$ex_info];
	207	}
	208	}
210	209	}
211	210
212	211	return \@feats;
213	212	}
214	213
215		sub get_lav_annots {
216		my ($get_annots_sql, $seq_len) = @_;
217
218		my ($pos, $end, $label, $value, $comment);
219
220		my @feats = ();
221
222		my %annot = ();
223		while (($acc, $pos, $end, $label, $value) = $get_annots_sql->fetchrow_array()) {
224		next unless ($label =~ m/^DOMAIN/ \|\| $label =~ m/^REPEAT/);
225		$value =~ s/\s?\{.+\}\.?$//;
226		$value = domain_name($label,$value);
227		push @feats, [$pos, $end, $value];
228		}
229
230		return \@feats;
231		}
232
233		# domain name takes a uniprot domain label, removes comments ( ;
234		# truncated) and numbers and returns a canonical form. Thus:
235		# Cortactin 6.
236		# Cortactin 7; truncated.
237		# becomes "Cortactin"
238		#
239
240		sub domain_name {
241
242		my ($label, $value) = @_;
243
244		if ($label =~ /DOMAIN\|REPEAT/) {
245		$value =~ s/;.*$//;
246		$value =~ s/\s+\d+\.?$//;
247		$value =~ s/\.\s*$//;
248		$value =~ s/\s+\d+\.\s+.*$//;
249		$value =~ s/\s+/_/;
250		if (!defined($domains{$value})) {
251		$domain_cnt++;
252		$domains{$value} = $domain_cnt;
253		}
254		return $value;
255		}
256		else {
257		return $value;
258		}
259		}
260
261	214	__END__
262	215
263	216	=pod

268	221
269	222	=head1 SYNOPSIS
270	223
271		ann_exons_up_sql.pl --no_doms --no_feats --lav 'sp\|P09488\|GSTM1_NUMAN' \| accession.file
	224	ann_exons_up_sql.pl --lav 'sp\|P09488\|GSTM1_NUMAN' \| accession.file
272	225
273	226	=head1 OPTIONS
274	227
275	228	-h short help
276	229	--help include description
277		--no-doms do not show domain boundaries (domains are always shown with --lav)
278		--no-feats do not show features (variants, active sites, phospho-sites)
279		--no-var do not show variant sites (--no_var, --novar)
	230	--gen_coord -- provide genomic exon start/stop coordinates as features
280	231	--lav produce lav2plt.pl annotation format, only show domains/repeats
281		--neg-doms, -- report domains between annotated domains as NODOM
282		(also --neg, --neg_doms)
283		--min_nodom=10 minimum non-domain length to produce NODOM
284	232	--host, --user, --password, --port --db -- info for mysql database
285	233
286	234	=head1 DESCRIPTION
287	235
288		C<ann_exons_up_sql.pl> extracts feature, domain, and repeat information from
289		a msyql database (default name, uniprot) built by parsing the
290		uniprot_sprot.dat and uniprot_trembl.dat feature tables. Given a
291		command line argument that contains a sequence accession (P09488) or
292		identifier (GSTM1_HUMAN), the program looks up the features available
293		for that sequence and returns them in a tab-delimited format:
	236	C<ann_exons_up_sql.pl> extracts exon location information from
	237	a msyql database (default name, uniprot) built from EBI/proteins API data.
	238
	239	Given a command line argument that contains a sequence accession
	240	(P09488) or identifier (GSTM1_HUMAN), the program looks up the
	241	features available for that sequence and returns them in a
	242	tab-delimited format:
294	243
295	244	>sp\|P09488\|GSTM1_HUMAN
296		2 - 88 GST_N-terminal~1
297		7 V F Mutagen: Reduces catalytic activity 100- fold. {ECO:0000269\|PubMed:16548513}.
298		34 * - MOD_RES: Phosphothreonine. {ECO:0000250\|UniProtKB:P10649}.
299		90 - 208 GST_C-terminal~2
300		108 V S Mutagen: Changes the properties of the enzyme toward some substrates. {ECO:0000269\|PubMed:16548513, ECO:0000269\|PubMed:9930979}.
301		108 V Q Mutagen: Reduces catalytic activity by half. {ECO:0000269\|PubMed:16548513, ECO:0000269\|PubMed:9930979}.
302		109 V I Mutagen: Reduces catalytic activity by half. {ECO:0000269\|PubMed:16548513}.
303		116 # - BINDING: Substrate.
304		116 V A Mutagen: Reduces catalytic activity 10-fold. {ECO:0000269\|PubMed:16548513}.
305		116 V F Mutagen: Slight increase of catalytic activity. {ECO:0000269\|PubMed:16548513}.
306		173 V N in allele GSTM1B; dbSNP:rs1065411. {ECO:0000269\|Ref.3, ECO:0000269\|Ref.5}.
307		210 * - MOD_RES: Phosphoserine. {ECO:0000250\|UniProtKB:P04905}.
308		210 V T in dbSNP:rs449856.
309
310		If features are provided, then a legend of feature symbols is provided
311		as well:
312
313		==:Active site
314		=*:Modified
315		=#:Substrate binding
316		=^:Site
317		=!:Metal binding
318
319		If the C<--lav> option is specified, domain and repeat features are
320		presented in a different format for the C<lav2plt.pl> program:
321
322		>sp\|P09488\|GSTM1_HUMAN
323		2 88 GST N-terminal.
324		90 208 GST C-terminal.
	245	1 - 12 exon_1~1
	246	13 - 38 exon_2~2
	247	39 - 59 exon_3~3
	248	60 - 87 exon_4~4
	249	88 - 120 exon_5~5
	250	121 - 152 exon_6~6
	251	153 - 189 exon_7~7
	252	190 - 218 exon_8~8
	253
	254	C<ann_exons_up_sql.pl --gen_coord 'sp\|P09488\|GSTM1_HUMAN'>also provides genomic coordinates:
	255
	256	>sp\|P09488\|GSTM1_HUMAN
	257	1 - 12 exon_1~1
	258	1 < - exon_1::chr1:109687874
	259	12 > - exon_1::chr1:109687909
	260	13 - 37 exon_2~2
	261	13 < - exon_2::chr1:109688170
	262	37 > - exon_2::chr1:109688245
	263	38 - 59 exon_3~3
	264	38 < - exon_3::chr1:109688673
	265	59 > - exon_3::chr1:109688737
	266	...
	267	190 - 218 exon_8~8
	268	190 < - exon_8::chr1:109693206
	269	218 > - exon_8::chr1:109693292
325	270
326	271	C<ann_exons_up_sql.pl> is designed to be used by the B<FASTA> programs
327	272	with the C<-V \!ann_exons_up_sql.pl> option, or by the

+446

-0

scripts/ann_exons_up_sql_www.pl less more

	0	#!/usr/bin/env perl
	1
	2	################################################################
	3	# copyright (c) 2014,2015 by William R. Pearson and The Rector &
	4	# Visitors of the University of Virginia
	5	################################################################
	6	# Licensed under the Apache License, Version 2.0 (the "License");
	7	# you may not use this file except in compliance with the License.
	8	# You may obtain a copy of the License at
	9	#
	10	# http://www.apache.org/licenses/LICENSE-2.0
	11	#
	12	# Unless required by applicable law or agreed to in writing,
	13	# software distributed under this License is distributed on an "AS
	14	# IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either
	15	# express or implied. See the License for the specific language
	16	# governing permissions and limitations under the License.
	17	################################################################
	18
	19	# ann_exons_up_sql.pl gets an annotation file from fasta36 -V with a line of the form:
	20
	21	# gi\|62822551\|sp\|P00502\|GSTA1_RAT Glutathione S-transfer\n (at least from pir1.lseg)
	22	#
	23	# it must:
	24	# (1) read in the line
	25	# (2) parse it to get the up_acc
	26	# (3) return the tab delimited features
	27
	28	# this version can read feature2 uniprot features (acc/pos/end/label/value), but returns sorted start/end domains
	29	# modified 18-Jan-2016 to produce annotation symbols consistent with ann_exons_up_www2.pl
	30	# modified Dec 2018 to generate genomic coordinates with --gen_coord
	31	# modified 3-Jan-2019 to merge sql and www (--www) access to exon coordinates
	32
	33	use warnings;
	34	use strict;
	35
	36	use DBI;
	37	use Getopt::Long;
	38	use Pod::Usage;
	39	use LWP::Simple;
	40	use LWP::UserAgent;
	41	use JSON qw(decode_json);
	42
	43	use vars qw($host $db $a_table $port $user $pass);
	44
	45	my %domains = ();
	46	my $domain_cnt = 0;
	47
	48	my $hostname = `/bin/hostname`;
	49
	50	unless ($hostname =~ m/ebi/) {
	51	($host, $db, $a_table, $port, $user, $pass) = ("wrpxdb.its.virginia.edu", "uniprot", "annot2", 0, "web_user", "fasta_www");
	52	# $host = 'xdb';
	53	}
	54	else {
	55	($host, $db, $a_table, $port, $user, $pass) = ("mysql-pearson-prod", "up_db", "annot", 4124, "web_user", "fasta_www");
	56	}
	57
	58	my ($lav, $gen_coord, $exon_label, $use_www, $shelp, $help) = (0,0,0,0,0,0);
	59
	60	my ($show_color) = (1);
	61	my $color_sep_str = " :";
	62	$color_sep_str = '~';
	63
	64	GetOptions(
	65	"gen_coord\|gene_coord!" => \$gen_coord,
	66	"exon_label\|label_exons!" => \$exon_label,
	67	"www!" => \$use_www,
	68	"host=s" => \$host,
	69	"db=s" => \$db,
	70	"user=s" => \$user,
	71	"password=s" => \$pass,
	72	"port=i" => \$port,
	73	"lav" => \$lav,
	74	"h\|?" => \$shelp,
	75	"help" => \$help,
	76	);
	77
	78	pod2usage(1) if $shelp;
	79	pod2usage(exitstatus => 0, verbose => 2) if $help;
	80	pod2usage(1) unless (@ARGV \|\| -p STDIN \|\| -f STDIN);
	81
	82	my $connect = "dbi:mysql(AutoCommit=>1,RaiseError=>1):database=$db";
	83	$connect .= ";host=$host" if $host;
	84	$connect .= ";port=$port" if $port;
	85
	86	my $dbh = DBI->connect($connect,
	87	$user,
	88	$pass
	89	) or die $DBI::errstr;
	90
	91
	92	my $get_annot_sub = \&get_annots;
	93
	94
	95	my $ua = LWP::UserAgent->new(ssl_opts=>{verify_hostname => 0});
	96	my $uniprot_url = 'https://www.ebi.ac.uk/proteins/api/coordinates/';
	97	my $uniprot_suff = ".json";
	98
	99
	100	if ($use_www) {
	101	$get_annot_sub = \&get_annots_up_www;
	102	}
	103
	104
	105	my $get_annots_id = $dbh->prepare(qq(select up_exons.* from up_exons join annot2 using(acc) where id=? order by ix));
	106	my $get_annots_acc = $dbh->prepare(qq(select up_exons.* from up_exons where acc=? order by ix));
	107	my $get_annots_refacc = $dbh->prepare(qq(select ref_acc, start, end, ix from up_exons join annot2 using(acc) where ref_acc=? order by ix));
	108
	109	my $get_annots_sql = $get_annots_acc;
	110
	111	my ($tmp, $gi, $sdb, $acc, $id, $use_acc);
	112
	113	# get the query
	114	my ($query, $seq_len) = @ARGV;
	115	$seq_len = 0 unless defined($seq_len);
	116
	117	$query =~ s/^>// if ($query);
	118
	119	my @annots = ();
	120
	121	#if it's a file I can open, read and parse it
	122	unless ($query && ($query =~ m/[\\|:]/ \|\|
	123	$query =~ m/^[NX]P_/ \|\|
	124	$query =~ m/^[OPQ][0-9][A-Z0-9]{3}[0-9]\|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}\s/)) {
	125
	126	while (my $a_line = <>) {
	127	$a_line =~ s/^>//;
	128	chomp $a_line;
	129	push @annots, show_annots($a_line, $get_annot_sub, $use_www);
	130	}
	131	}
	132	else {
	133	push @annots, show_annots("$query\t$seq_len", $get_annot_sub, $use_www);
	134	}
	135
	136	for my $seq_annot (@annots) {
	137	print ">",$seq_annot->{seq_info},"\n";
	138	for my $annot (@{$seq_annot->{list}}) {
	139	if (!$lav && $show_color && defined($domains{$annot->[-1]})) {
	140	$annot->[-1] .= $color_sep_str.$domains{$annot->[-1]};
	141	}
	142	print join("\t",@$annot),"\n";
	143	}
	144	}
	145
	146	exit(0);
	147
	148	sub show_annots {
	149	my ($query_len, $get_annot_sub, $use_www) = @_;
	150
	151	my ($annot_line, $seq_len) = split(/\t/,$query_len);
	152
	153	my %annot_data = (seq_info=>$annot_line);
	154
	155	if ($annot_line =~ m/^gi\\|/) {
	156	$use_acc = 1;
	157	($tmp, $gi, $sdb, $acc, $id) = split(/\\|/,$annot_line);
	158	}
	159	elsif ($annot_line =~ m/^(SP\|TR):(\w+) (\w+)/) {
	160	($sdb, $id, $acc) = ($1,$2,$3);
	161	$use_acc = 1;
	162	$sdb = lc($sdb)
	163	}
	164	elsif ($annot_line =~ m/^(SP\|TR):(\w+)/) {
	165	($sdb, $id) = ($1,$2);
	166	$use_acc = 0;
	167	$sdb = lc($sdb)
	168	}
	169	elsif ($annot_line !~ m/\\|/) { # new NCBI swissprot format
	170	$use_acc =1;
	171	$sdb = 'sp';
	172	($acc) = split(/\s+/,$annot_line);
	173	}
	174	else {
	175	$use_acc = 1;
	176	($sdb, $acc, $id) = split(/\\|/,$annot_line);
	177	}
	178
	179	unless ($use_acc) {
	180	$get_annots_sql = $get_annots_id;
	181	$get_annots_sql->execute($id);
	182	}
	183	else {
	184	unless ($sdb =~ m/ref/) {
	185	$get_annots_sql = $get_annots_acc;
	186	} else {
	187	$get_annots_sql = $get_annots_refacc;
	188	}
	189	$acc =~ s/\.\d+$//;
	190
	191	unless ($use_www) {
	192	$get_annots_sql->execute($acc);
	193	}
	194	else {
	195	$get_annots_sql = $acc;
	196	}
	197	}
	198
	199	$annot_data{list} = $get_annot_sub->($get_annots_sql, $seq_len);
	200
	201	return \%annot_data;
	202	}
	203
	204	sub get_annots {
	205	my ($get_annots_sql, $seq_len) = @_;
	206
	207	my @feats = ();
	208
	209	while (my $exon_hr = $get_annots_sql->fetchrow_hashref()) {
	210	my $ix = $exon_hr->{ix};
	211	if ($lav) {
	212	push @feats, [$exon_hr->{start}, $exon_hr->{end}, "exon_$ix~$ix"];
	213	} else {
	214	my ($exon_info,$ex_info_start, $ex_info_end) = ("exon_$ix~$ix","","");
	215	if ($gen_coord) {
	216	if (defined($exon_hr->{g_start})) {
	217	my $chr=$exon_hr->{chrom};
	218	$chr = "unk" unless $chr;
	219	if ($chr =~ m/^\d+$/ \|\| $chr =~m/^[XYZ]+$/) {
	220	$chr = "chr$chr";
	221	}
	222	$ex_info_start = sprintf("exon_%d::%s:%d",$ix, $chr, $exon_hr->{g_start});
	223	$ex_info_end = sprintf("exon_%d::%s:%d",$ix, $chr, $exon_hr->{g_end});
	224	if ($exon_label) {
	225	$exon_info = sprintf("exon_%d{%s:%d-%d}~%d",$ix, $chr, $exon_hr->{g_start}, $exon_hr->{g_end}, $ix);
	226	push @feats, [$exon_hr->{start}, "-", $exon_hr->{end}, $exon_info];
	227	} else {
	228	push @feats, [$exon_hr->{start}, "-", $exon_hr->{end}, $exon_info];
	229	push @feats, [$exon_hr->{start},'<','-',$ex_info_start];
	230	push @feats, [$exon_hr->{end},'>','-',$ex_info_end];
	231	}
	232	}
	233	} else {
	234	push @feats, [$exon_hr->{start}, "-", $exon_hr->{end}, $exon_info];
	235	}
	236	}
	237	}
	238
	239	return \@feats;
	240	}
	241
	242	sub get_annots_up_www {
	243	my ($acc, $seq_len) = @_;
	244
	245	my @feats = ();
	246
	247	# my $exon_json = get_https($uniprot_url.$acc.$uniprot_suff);
	248	my $exon_json = get($uniprot_url.$acc.$uniprot_suff);
	249
	250	unless (!$exon_json \|\| $exon_json =~ m/errorMessage/ \|\| $exon_json =~ m/Can not find/) {
	251	return parse_json_up_exons($exon_json);
	252	}
	253	else {
	254	return ();
	255	}
	256	}
	257
	258	sub parse_json_up_exons {
	259	my ($exon_json) = @_;
	260
	261	my @exons = ();
	262	my @ex_coords = ();
	263
	264	my $acc_exons = decode_json($exon_json);
	265
	266	my $exon_num = 1;
	267	my $last_end = 0;
	268	my $last_phase = 0;
	269
	270	my $chrom = $acc_exons->{'gnCoordinate'}[0]{'genomicLocation'}{'chromosome'};
	271	my $rev_strand = $acc_exons->{'gnCoordinate'}[0]{'genomicLocation'}{'reverseStrand'};
	272
	273	for my $exon ( @{$acc_exons->{'gnCoordinate'}[0]{'genomicLocation'}{'exon'}} ) {
	274	my ($p_begin, $p_end) = ($exon->{'proteinLocation'}{'begin'}{'position'},$exon->{'proteinLocation'}{'end'}{'position'});
	275	my ($g_begin, $g_end) = ($exon->{'genomeLocation'}{'begin'}{'position'},$exon->{'genomeLocation'}{'end'}{'position'});
	276
	277	my $this_phase = 0;
	278	if (defined($g_begin) && defined($g_end)) {
	279	$this_phase = ($g_end - $g_begin + 1) % 3;
	280	}
	281
	282	if (!defined($p_begin) \|\| !defined($p_end)) {
	283	$exon_num++;
	284	$last_phase = 0;
	285	next;
	286	}
	287
	288	if ($p_end >= $p_begin) {
	289	if ($p_begin == $last_end) {
	290	if ($last_phase==2) {
	291	$p_begin += 1;
	292	}
	293	elsif ($last_phase==1) {
	294	$last_end -= 1;
	295	$exons[-1]->{seq_end} -= 1;
	296	}
	297	}
	298
	299	if ($p_begin <= $last_end && $p_end > $last_end) {
	300	$p_begin = $last_end+1;
	301	}
	302	$last_end = $p_end;
	303	$last_phase = $this_phase;
	304
	305	my ($gs_begin, $gs_end) = ($g_begin, $g_end);
	306	if ($rev_strand) {
	307	($gs_begin, $gs_end) = ($g_end, $g_begin);
	308	}
	309
	310	push @exons, {
	311	ix=>$exon_num,
	312	start=>$p_begin,
	313	end=>$p_end,
	314	g_start=>$gs_begin,
	315	g_end=>$gs_end,
	316	chrom=>$chrom,
	317	};
	318
	319	$exon_num++;
	320	}
	321	}
	322
	323	# check for domain overlap, and resolve check for domain overlap
	324	# (possibly more than 2 domains), choosing the domain with the best
	325	# evalue
	326
	327	my @ex_feats = ();
	328
	329	for my $exon_hr (@exons) {
	330	my $ix = $exon_hr->{ix};
	331	if ($lav) {
	332	push @ex_feats, [$exon_hr->{start}, $exon_hr->{end}, "exon_$ix~$ix" ];
	333	}
	334	else {
	335	my ($exon_info,$ex_info_start, $ex_info_end) = ("exon_$ix~$ix","","");
	336	if ($gen_coord) {
	337	if (defined($exon_hr->{g_start})) {
	338	my $chr=$exon_hr->{chrom};
	339	$chr = "unk" unless $chr;
	340	if ($chr =~ m/^\d+$/ \|\| $chr =~m/^[XYZ]+$/) {
	341	$chr = "chr$chr";
	342	}
	343	$ex_info_start = sprintf("exon_%d::%s:%d",$ix, $chr, $exon_hr->{g_start});
	344	$ex_info_end = sprintf("exon_%d::%s:%d",$ix, $chr, $exon_hr->{g_end});
	345	if ($exon_label) {
	346	$exon_info = sprintf("exon_%d{%s:%d-%d}~%d",$ix, $chr, $exon_hr->{g_start}, $exon_hr->{g_end},$ix);
	347	push @ex_feats, [$exon_hr->{start}, "-", $exon_hr->{end}, $exon_info];
	348	} else {
	349	push @ex_feats, [$exon_hr->{start}, "-", $exon_hr->{end}, $exon_info];
	350	push @ex_feats, [$exon_hr->{start},'<','-',$ex_info_start];
	351	push @ex_feats, [$exon_hr->{end},'>','-',$ex_info_end];
	352	}
	353	}
	354	} else {
	355	push @ex_feats, [$exon_hr->{start}, "-", $exon_hr->{end}, $exon_info];
	356	}
	357	}
	358	}
	359	return \@ex_feats;
	360	}
	361
	362	sub get_https {
	363	my ($url) = @_;
	364
	365	my $result = "";
	366	my $response = $ua->get($url);
	367
	368	if ($response->is_success) {
	369	$result = $response->decoded_content;
	370	} else {
	371	$result = '';
	372	}
	373	return $result;
	374	}
	375
	376
	377
	378	__END__
	379
	380	=pod
	381
	382	=head1 NAME
	383
	384	ann_exons_up_sql.pl
	385
	386	=head1 SYNOPSIS
	387
	388	ann_exons_up_sql.pl --lav 'sp\|P09488\|GSTM1_NUMAN' \| accession.file
	389
	390	=head1 OPTIONS
	391
	392	-h short help
	393	--help include description
	394	--gen_coord -- provide genomic exon start/stop coordinates as features
	395	--lav produce lav2plt.pl annotation format, only show domains/repeats
	396	--host, --user, --password, --port --db -- info for mysql database
	397
	398	=head1 DESCRIPTION
	399
	400	C<ann_exons_up_sql.pl> extracts exon location information from
	401	a msyql database (default name, uniprot) built from EBI/proteins API data.
	402
	403	Given a command line argument that contains a sequence accession
	404	(P09488) or identifier (GSTM1_HUMAN), the program looks up the
	405	features available for that sequence and returns them in a
	406	tab-delimited format:
	407
	408	>sp\|P09488\|GSTM1_HUMAN
	409	1 - 12 exon_1~1
	410	13 - 38 exon_2~2
	411	39 - 59 exon_3~3
	412	60 - 87 exon_4~4
	413	88 - 120 exon_5~5
	414	121 - 152 exon_6~6
	415	153 - 189 exon_7~7
	416	190 - 218 exon_8~8
	417
	418	C<ann_exons_up_sql.pl --gen_coord 'sp\|P09488\|GSTM1_HUMAN'>also provides genomic coordinates:
	419
	420	>sp\|P09488\|GSTM1_HUMAN
	421	1 - 12 exon_1~1
	422	1 < - exon_1::chr1:109687874
	423	12 > - exon_1::chr1:109687909
	424	13 - 37 exon_2~2
	425	13 < - exon_2::chr1:109688170
	426	37 > - exon_2::chr1:109688245
	427	38 - 59 exon_3~3
	428	38 < - exon_3::chr1:109688673
	429	59 > - exon_3::chr1:109688737
	430	...
	431	190 - 218 exon_8~8
	432	190 < - exon_8::chr1:109693206
	433	218 > - exon_8::chr1:109693292
	434
	435	C<ann_exons_up_sql.pl> is designed to be used by the B<FASTA> programs
	436	with the C<-V \!ann_exons_up_sql.pl> option, or by the
	437	C<annot_blast_btop.pl> script. It can also be used with the
	438	lav2plt.pl program with the C<--xA "\!ann_exons_up_sql.pl --lav"> or
	439	C<--yA "\!ann_exons_up_sql.pl --lav"> options.
	440
	441	=head1 AUTHOR
	442
	443	William R. Pearson, wrp@virginia.edu
	444
	445	=cut

+45

-22

scripts/ann_exons_up_www.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014,2015 by William R. Pearson and The Rector &

16	16	# governing permissions and limitations under the License.
17	17	################################################################
18	18
19		# ann_exons_up_www.pl gets an annotation file from fasta36 -V with a line of the form:
20
21		# gi\|23065544\|ref\|NP_000552.2\|
22		#
23		# and returns the exons present in the protein from NCBI gff3 tables (human and mouse only)
	19	# ann_exons_up_www.pl gets an annotation file from fasta36 -V with a
	20	# line of the form:
	21	#
	22	# sp\|P09488\|GSTM1_HUMAN<tab>218
	23	#
	24	# and uses the EBI protein coordinate API to get the locations of exons
	25	# https://www.ebi.ac.uk/proteins/api/coordinates/P09488.json
24	26	#
25	27	# it must:
26	28	# (1) read in the line

28	30	# (3) get exon information from EBI/Uniprot
29	31	# (4) return the tab delimited exon boundaries
30	32
31		# 22-May-2017 -- use get("http://"), not get_https("https://"), because EBI does not have LWP::Protocol:https
32
	33	# 22-May-2017 -- use get("https://"), not get_https("https://"), because EBI does not have LWP::Protocol:https
	34
	35	# 11-Dec-2018 -- modified to include --gen_coord, which reports exon starts and stops in genomic coordinates as <, >
	36
	37	use warnings;
33	38	use strict;
34	39
35	40	use Getopt::Long;

41	46
42	47	use vars qw($host $db $port $user $pass);
43	48
44		my ($lav, $shelp, $help) = (0, 0,0);
	49	my ($lav, $gen_coord, $shelp, $help) = (0, 0, 0, 0);
45	50
46	51	my $color_sep_str = " :";
47	52	$color_sep_str = '~';
48	53
49	54	GetOptions(
	55	"gen_coord!" => \$gen_coord,
50	56	"lav" => \$lav,
51	57	"h\|?" => \$shelp,
52	58	"help" => \$help,

65	71	my $get_annot_sub = \&get_up_www_exons;
66	72
67	73	my $ua = LWP::UserAgent->new(ssl_opts=>{verify_hostname => 0});
68		my $uniprot_url = 'http://www.ebi.ac.uk/proteins/api/coordinates/';
	74	my $uniprot_url = 'https://www.ebi.ac.uk/proteins/api/coordinates/';
69	75	my $uniprot_suff = ".json";
70	76
71	77	# get the query

131	137
132	138	$acc =~ s/\.\d+$//;
133	139
	140	# my $exon_json = get_https($uniprot_url.$acc.$uniprot_suff);
134	141	my $exon_json = get($uniprot_url.$acc.$uniprot_suff);
135	142
136	143	unless (!$exon_json \|\| $exon_json =~ m/errorMessage/ \|\| $exon_json =~ m/Can not find/) {

144	151	my ($exon_json) = @_;
145	152
146	153	my @exons = ();
	154	my @ex_coords = ();
147	155
148	156	my $acc_exons = decode_json($exon_json);
149	157
150	158	my $exon_num = 1;
151	159	my $last_end = 0;
152	160	my $last_phase = 0;
	161
	162	my $chrom = $acc_exons->{'gnCoordinate'}[0]{'genomicLocation'}{'chromosome'};
	163	my $rev_strand = $acc_exons->{'gnCoordinate'}[0]{'genomicLocation'}{'reverseStrand'};
153	164
154	165	for my $exon ( @{$acc_exons->{'gnCoordinate'}[0]{'genomicLocation'}{'exon'}} ) {
155	166	my ($p_begin, $p_end) = ($exon->{'proteinLocation'}{'begin'}{'position'},$exon->{'proteinLocation'}{'end'}{'position'});

183	194	$last_end = $p_end;
184	195	$last_phase = $this_phase;
185	196
	197	my $info ="exon_".$exon_num.$color_sep_str.$exon_num;
	198
	199	my ($gs_begin, $gs_end) = ($g_begin, $g_end);
	200	if ($rev_strand) {
	201	($gs_begin, $gs_end) = ($g_end, $g_begin);
	202	}
	203
186	204	push @exons, {
187		info=>"exon_".$exon_num.$color_sep_str.$exon_num,
	205	info=>$info,
	206	exon_num=>$exon_num,
188	207	seq_start=>$p_begin,
189	208	seq_end=>$p_end,
	209	gen_seq_start=>$gs_begin,
	210	gen_seq_end=>$gs_end,
	211	chrom=>$chrom,
190	212	};
	213
191	214	$exon_num++;
192	215	}
193	216	}

204	227	}
205	228	else {
206	229	push @ex_feats, [$d_ref->{seq_start}, '-', $d_ref->{seq_end}, $d_ref->{info} ];
	230	if ($gen_coord) {
	231	my $chr=$d_ref->{chrom};
	232	if ($chr =~ m/^\d+$/ \|\| $chr =~m/^[XYZ]+$/) {
	233	$chr = "chr$chr";
	234	}
	235	my $ex_info = sprintf("exon_%d::%s:%d",$d_ref->{exon_num}, $chr, $d_ref->{gen_seq_start});
	236	push @ex_feats, [$d_ref->{seq_start},'<','-',$ex_info];
	237	$ex_info = sprintf("exon_%d::%s:%d",$d_ref->{exon_num}, $chr, $d_ref->{gen_seq_end});
	238	push @ex_feats, [$d_ref->{seq_end},'>','-',$ex_info];
	239	}
207	240	}
208	241	}
209	242	return \@ex_feats;

223	256	return $result;
224	257	}
225	258
226		sub domain_name {
227
228		my ($value) = @_;
229
230		if (!defined($domains{$value})) {
231		$domain_cnt++;
232		$domains{$value} = $domain_cnt;
233		}
234		return $value;
235		}
236
237	259	__END__
238	260
239	261	=pod

251	273	-h short help
252	274	--help include description
253	275	--lav produce lav2plt.pl annotation format
	276	--gen_coord produce genome coordinate features
254	277
255	278	=head1 DESCRIPTION
256	279

+2

-1

scripts/ann_feats2ipr.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014 by William R. Pearson and The Rector &

34	34	# ann_feats2ipr.pl is largely identical to ann_feats2l.pl, except that
35	35	# it uses Interpro for domain/repeat information.
36	36
	37	use warnings;
37	38	use strict;
38	39
39	40	use DBI;

+2

-1

scripts/ann_feats2ipr_e.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014 by William R. Pearson and The Rector &

34	34	# ann_feats2ipr.pl is largely identical to ann_feats2l.pl, except that
35	35	# it uses Interpro for domain/repeat information.
36	36
	37	use warnings;
37	38	use strict;
38	39
39	40	use DBI;

+2

-1

scripts/ann_feats_up_sql.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014,2015 by William R. Pearson and The Rector &

29	29	# this version can read feature2 uniprot features (acc/pos/end/label/value), but returns sorted start/end domains
30	30	# modified 18-Jan-2016 to produce annotation symbols consistent with ann_feats_up_www2.pl
31	31
	32	use warnings;
32	33	use strict;
33	34
34	35	use DBI;

+22

-5

scripts/ann_feats_up_www2.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014,2015 by William R. Pearson and The Rector &

17	17	################################################################
18	18
19	19	## modified 29-Sept-2016 to use EBI/proteins JSON URL:
20		## http://www.ebi.ac.uk/proteins/api/features/p12345
	20	## https://www.ebi.ac.uk/proteins/api/features/p12345
21	21
22	22	# ann_feats_up_www2.pl gets an annotation file from fasta36 -V with a line of the form:
23	23

31	31
32	32	# this version can read feature2 uniprot features (acc/pos/end/label/value), but returns sorted start/end domains
33	33
	34	use warnings;
34	35	use strict;
35	36
36	37	use Getopt::Long;
37	38	use Pod::Usage;
38	39	use LWP::Simple;
	40	use LWP::UserAgent;
39	41	use JSON qw(decode_json);
40	42
41	43	## use IO::String;
42	44
43		my $up_base = 'http://www.ebi.ac.uk/proteins/api/features';
	45	my $ua = LWP::UserAgent->new(ssl_opts=>{verify_hostname => 0});
	46	my $up_base = 'https://www.ebi.ac.uk/proteins/api/features';
	47	my $uniprot_suff = ".json";
44	48
45	49	my %domains = ();
46	50	my $domain_cnt = 0;

213	217	my $lwp_features = "";
214	218
215	219	if ($acc && ($acc =~ m/^[A-Z][0-9][A-Z0-9]{3}[0-9]/)) {
216		$lwp_features = get("$up_base/$acc.json");
	220	$lwp_features = get_https("$up_base/$acc.json");
217	221	}
218	222	# elsif ($id && ($id =~ m/^\w+$/)) {
219	223	# $lwp_features = get("$up_base/$id/$gff_post");

366	370	}
367	371	}
368	372
	373	sub get_https {
	374	my ($url) = @_;
	375
	376	my $result = "";
	377	my $response = $ua->get($url);
	378
	379	if ($response->is_success) {
	380	$result = $response->decoded_content;
	381	} else {
	382	$result = '';
	383	}
	384	return $result;
	385	}
369	386
370	387
371	388	__END__

398	415
399	416	C<ann_feats_up_www2.pl> extracts feature, domain, and repeat
400	417	information from the Uniprot DAS server through an XSLT transation
401		provided by http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/uniprotkb.
	418	provided by https://www.ebi.ac.uk/Tools/dbfetch/dbfetch/uniprotkb.
402	419	This server provides GFF descriptions of Uniprot entries, with most of
403	420	the information provided in UniProt feature tables.
404	421

+2

-1

scripts/ann_ipr_www.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014 by William R. Pearson and The Rector &

36	36	# (3) return the tab delimited domains
37	37	#
38	38
	39	use warnings;
39	40	use strict;
40	41
41	42	use Getopt::Long;

+2

-1

scripts/ann_pdb_cath.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014 by William R. Pearson and The Rector &

34	34	# database
35	35	#
36	36
	37	use warnings;
37	38	use strict;
38	39
39	40	use DBI;

+2

-1

scripts/ann_pdb_vast.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014, 2015 by William R. Pearson and The Rector &

34	34	# database
35	35	#
36	36
	37	use warnings;
37	38	use strict;
38	39
39	40	use LWP::Simple;

+0

-656

~~scripts/ann_pfam27.pl~~ less more

0		#!/usr/bin/perl -w
1
2		################################################################
3		# copyright (c) 2014 by William R. Pearson and The Rector &
4		# Visitors of the University of Virginia */
5		################################################################
6		# Licensed under the Apache License, Version 2.0 (the "License");
7		# you may not use this file except in compliance with the License.
8		# You may obtain a copy of the License at
9		#
10		# http://www.apache.org/licenses/LICENSE-2.0
11		#
12		# Unless required by applicable law or agreed to in writing,
13		# software distributed under this License is distributed on an "AS
14		# IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either
15		# express or implied. See the License for the specific language
16		# governing permissions and limitations under the License.
17		################################################################
18
19		# ann_pfam_e.pl gets an annotation file from fasta36 -V with a line of the form:
20
21		# gi\|62822551\|sp\|P00502\|GSTA1_RAT Glutathione S-transfer\n (at least from pir1.lseg)
22		#
23		# it must:
24		# (1) read in the line
25		# (2) parse it to get the up_acc
26		# (3) return the tab delimited features
27		#
28
29		# this version only annotates sequences known to Pfam:pfamseq:
30		# >pf26\|164\|O57809\|1A1D_PYRHO
31		# and only provides domain information
32
33		use strict;
34
35		use DBI;
36		use Getopt::Long;
37		use Pod::Usage;
38
39		use vars qw($host $db $port $user $pass);
40
41		my $hostname = `/bin/hostname`;
42
43		($host, $db, $port, $user, $pass) = ("wrpxdb.its.virginia.edu", "pfam27", 0, "web_user", "fasta_www");
44		#$host = 'xdb';
45
46		my ($auto_reg,$rpd2_fams, $vdoms, $neg_doms, $lav, $no_doms, $no_clans, $pf_acc, $no_over, $acc_comment, $shelp, $help) =
47		(0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0);
48		my ($min_nodom, $min_vdom) = (10,10);
49
50		my $color_sep_str = " :";
51		$color_sep_str = '~';
52
53
54		GetOptions(
55		"host=s" => \$host,
56		"db=s" => \$db,
57		"user=s" => \$user,
58		"password=s" => \$pass,
59		"port=i" => \$port,
60		"lav" => \$lav,
61		"acc_comment" => \$acc_comment,
62		"no-over" => \$no_over,
63		"no_over" => \$no_over,
64		"no-clans" => \$no_clans,
65		"no_clans" => \$no_clans,
66		"neg" => \$neg_doms,
67		"neg_doms" => \$neg_doms,
68		"neg-doms" => \$neg_doms,
69		"min_nodom=i" => \$min_nodom,
70		"pfacc" => \$pf_acc,
71		"RPD2" => \$rpd2_fams,
72		"auto_reg" => \$auto_reg,
73		"vdoms" => \$vdoms,
74		"h\|?" => \$shelp,
75		"help" => \$help,
76		);
77
78		pod2usage(1) if $shelp;
79		pod2usage(exitstatus => 0, verbose => 2) if $help;
80		pod2usage(1) unless (@ARGV \|\| -p STDIN \|\| -f STDIN);
81
82		my $connect = "dbi:mysql(AutoCommit=>1,RaiseError=>1):database=$db";
83		$connect .= ";host=$host" if $host;
84		$connect .= ";port=$port" if $port;
85
86		my $dbh = DBI->connect($connect,
87		$user,
88		$pass
89		) or die $DBI::errstr;
90
91		my %annot_types = ();
92		my %domains = (NODOM=>0);
93		my %domain_clan = (NODOM => {clan_id => 'NODOM', clan_acc=>0, domain_cnt=>0});
94		my @domain_list = (0);
95		my $domain_cnt = 0;
96
97		my $get_annot_sub = \&get_pfam_annots;
98
99		my $get_pfam_acc = $dbh->prepare(<<EOSQL);
100
101		SELECT seq_start, seq_end, model_start, model_end, model_length, auto_pfamA, pfamA_acc, pfamA_id, auto_pfamA_reg_full, domain_evalue_score as evalue, length
102		FROM pfamseq
103		JOIN pfamA_reg_full_significant using(auto_pfamseq)
104		JOIN pfamA USING (auto_pfamA)
105		WHERE in_full = 1
106		AND pfamseq_acc=?
107		ORDER BY seq_start
108
109		EOSQL
110
111		my $get_pfam_refacc = $dbh->prepare(<<EOSQL);
112
113		SELECT seq_start, seq_end, auto_pfamA, pfamA_acc, pfamA_id, auto_pfamA_reg_full, domain_evalue_score as evalue, length
114		FROM pfamseq
115		JOIN pfamA_reg_full_significant using(auto_pfamseq)
116		JOIN pfamA USING (auto_pfamA)
117		JOIN seqdb_demo2.annot as sa1 on(sa1.acc=pfamseq_acc and sa1.db='sp')
118		JOIN seqdb_demo2.annot as sa2 using(prot_id)
119		WHERE in_full = 1
120		AND sa2.acc=?
121		AND sa2.db='ref'
122		ORDER BY seq_start
123
124		EOSQL
125
126		my $get_annots_sql = $get_pfam_acc;
127
128		my $get_pfam_id = $dbh->prepare(<<EOSQL);
129
130		SELECT seq_start, seq_end, auto_pfamA, pfamA_acc, pfamA_id, auto_pfamA_reg_full, domain_evalue_score as evalue, length
131		FROM pfamseq
132		JOIN pfamA_reg_full_significant using(auto_pfamseq)
133		JOIN pfamA USING (auto_pfamA)
134		WHERE in_full=1
135		AND pfamseq_id=?
136		ORDER BY seq_start
137
138		EOSQL
139
140		my $get_pfam_clan = $dbh->prepare(<<EOSQL);
141
142		SELECT clan_acc, clan_id
143		FROM clans
144		JOIN clan_membership using(auto_clan)
145		WHERE auto_pfamA=?
146
147		EOSQL
148
149		my $get_rpd2_clans = $dbh->prepare(<<EOSQL);
150
151		SELECT auto_pfamA, clan
152		FROM ljm_db.RPD2_final_fams
153		WHERE clan is not NULL
154
155		EOSQL
156
157		# -- LEFT JOIN clan_membership USING (auto_pfamA)
158		# -- LEFT JOIN clans using(auto_clan)
159
160		my ($tmp, $gi, $sdb, $acc, $id, $use_acc);
161
162		# get the query
163		my ($query, $seq_len) = @ARGV;
164		$seq_len = 0 unless defined($seq_len);
165
166		$query =~ s/^>// if ($query);
167
168		my @annots = ();
169
170		my %rpd2_clan_fams = ();
171
172		if ($rpd2_fams) {
173		$get_rpd2_clans->execute();
174		my ($auto_pfam, $auto_clan);
175		while (($auto_pfam, $auto_clan)=$get_rpd2_clans->fetchrow_array()) {
176		$rpd2_clan_fams{$auto_pfam} = $auto_clan;
177		}
178		}
179
180		#if it's a file I can open, read and parse it
181		unless ($query && $query =~ m/[\\|:]/) {
182
183		while (my $a_line = <>) {
184		$a_line =~ s/^>//;
185		chomp $a_line;
186		push @annots, show_annots($a_line, $get_annot_sub);
187		}
188		}
189		else {
190		push @annots, show_annots("$query $seq_len", $get_annot_sub);
191		}
192
193		for my $seq_annot (@annots) {
194		print ">",$seq_annot->{seq_info},"\n";
195		for my $annot (@{$seq_annot->{list}}) {
196		if (!$lav && defined($domains{$annot->[-1]})) {
197		my ($a_name, $a_num) = domain_num($annot->[-1],$domains{$annot->[-1]});
198		if ($acc_comment) {
199		$annot->[-1] .= $a_name."{$domain_list[$a_num]}";
200		}
201		$annot->[-1] = $a_name.$color_sep_str.$a_num;
202		}
203		print join("\t",@$annot),"\n";
204		}
205		}
206
207		exit(0);
208
209		sub show_annots {
210		my ($query_len, $get_annot_sub) = @_;
211
212		my ($annot_line, $seq_len) = split(/\s+/,$query_len);
213
214		my $pfamA_acc;
215
216		my %annot_data = (seq_info=>$annot_line);
217
218		$use_acc = 1;
219		$get_annots_sql = $get_pfam_acc;
220
221		if ($annot_line =~ m/^pf26\\|/) {
222		($sdb, $gi, $acc, $id) = split(/\\|/,$annot_line);
223		$dbh->do("use RPD2_pfam");
224		}
225		elsif ($annot_line =~ m/^gi\\|/) {
226		($tmp, $gi, $sdb, $acc, $id) = split(/\\|/,$annot_line);
227		if ($sdb =~ m/ref/) {
228		$get_annots_sql = $get_pfam_refacc;
229		}
230		}
231		elsif ($annot_line =~ m/^sp\\|/) {
232		($sdb, $acc, $id) = split(/\\|/,$annot_line);
233		}
234		elsif ($annot_line =~ m/^ref\\|/) {
235		($sdb, $acc) = split(/\\|/,$annot_line);
236		$get_annots_sql = $get_pfam_refacc;
237		}
238		elsif ($annot_line =~ m/^tr\\|/) {
239		($sdb, $acc, $id) = split(/\\|/,$annot_line);
240		}
241		elsif ($annot_line =~ m/^SP:/i) {
242		($sdb, $id) = split(/:/,$annot_line);
243		$use_acc = 0;
244		}
245		else {
246		$use_acc = 1;
247		($acc) = split(/\s+/,$annot_line);
248		}
249
250		# remove version number
251		unless ($use_acc) {
252		$get_annots_sql = $get_pfam_id;
253		$get_annots_sql->execute($id);
254		}
255		else {
256		$acc =~ s/\.\d+$//;
257		$get_annots_sql->execute($acc);
258		}
259
260		$annot_data{list} = $get_annot_sub->($get_annots_sql, $seq_len);
261
262		return \%annot_data;
263		}
264
265		sub get_pfam_annots {
266		my ($get_annots, $seq_length) = @_;
267
268		$seq_length = 0 unless $seq_length;
269
270		my @pf_domains = ();
271
272		# get the list of domains, sorted by start
273		while ( my $row_href = $get_annots->fetchrow_hashref()) {
274		if ($auto_reg) {
275		$row_href->{info} = $row_href->{auto_pfamA_reg_full};
276		}
277		elsif ($pf_acc) {
278		$row_href->{info} = $row_href->{pfamA_acc};
279		}
280		else {
281		$row_href->{info} = $row_href->{pfamA_id};
282		}
283
284		if ($row_href && $row_href->{length} > $seq_length && $seq_length == 0) { $seq_length = $row_href->{length};}
285
286		next if ($row_href->{seq_start} >= $seq_length);
287		if ($row_href->{seq_end} > $seq_length) {
288		$row_href->{seq_end} = $seq_length;
289		}
290
291		push @pf_domains, $row_href
292		}
293
294		# check for domain overlap, and resolve check for domain overlap
295		# (possibly more than 2 domains), choosing the domain with the best
296		# evalue
297
298		if($no_over && scalar(@pf_domains) > 1) {
299
300		my @tmp_domains = @pf_domains;
301		my @save_domains = ();
302
303		my $prev_dom = shift @tmp_domains;
304
305		while (my $curr_dom = shift @tmp_domains) {
306
307		my @overlap_domains = ($prev_dom);
308
309		my $diff = $prev_dom->{seq_end} - $curr_dom->{seq_start};
310		# check for overlap > domain_length/3
311
312		my ($prev_len, $cur_len) = ($prev_dom->{seq_end}-$prev_dom->{seq_start}+1, $curr_dom->{seq_end}-$curr_dom->{seq_start}+1);
313		my $inclusion = ((($curr_dom->{seq_start} >= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} <= $prev_dom->{seq_end})) \|\|
314		(($curr_dom->{seq_start} <= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} >= $prev_dom->{seq_end})));
315
316		my $longer_len = ($prev_len > $cur_len) ? $prev_len : $cur_len;
317
318		while ($inclusion \|\| ($diff > 0 && $diff > $longer_len/3)) {
319		push @overlap_domains, $curr_dom;
320		$curr_dom = shift @tmp_domains;
321		last unless $curr_dom;
322		$diff = $prev_dom->{seq_end} - $curr_dom->{seq_start};
323		($prev_len, $cur_len) = ($prev_dom->{seq_end}-$prev_dom->{seq_start}+1, $curr_dom->{seq_end}-$curr_dom->{seq_start}+1);
324		$longer_len = ($prev_len > $cur_len) ? $prev_len : $cur_len;
325		$inclusion = ((($curr_dom->{seq_start} >= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} <= $prev_dom->{seq_end})) \|\|
326		(($curr_dom->{seq_start} <= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} >= $prev_dom->{seq_end})));
327		}
328
329		# check for overlapping domains; >1 because $prev_dom is always there
330		if (scalar(@overlap_domains) > 1 ) {
331		# if $rpd2_fams, check for a chosen one
332		if ($rpd2_fams) {
333		for my $dom (@overlap_domains) {
334		if ($rpd2_clan_fams{$dom->{auto_pfamA}}) {
335		$prev_dom = $dom;
336		last;
337		}
338		}
339		}
340		else {
341		@overlap_domains = sort { $a->{evalue} <=> $b->{evalue} } @overlap_domains;
342		$prev_dom = $overlap_domains[0];
343		}
344		}
345
346		# $prev_dom should be the best of the overlaps, and we are no longer overlapping > dom_length/3
347		push @save_domains, $prev_dom;
348		$prev_dom = $curr_dom;
349		}
350		if ($prev_dom) {push @save_domains, $prev_dom;}
351
352		@pf_domains = @save_domains;
353
354		# now check for smaller overlaps
355		for (my $i=1; $i < scalar(@pf_domains); $i++) {
356		if ($pf_domains[$i-1]->{seq_end} >= $pf_domains[$i]->{seq_start}) {
357		my $overlap = $pf_domains[$i-1]->{seq_end} - $pf_domains[$i]->{seq_start};
358		$pf_domains[$i-1]->{seq_end} -= int($overlap/2);
359		$pf_domains[$i]->{seq_start} = $pf_domains[$i-1]->{seq_end}+1;
360		}
361		}
362		}
363
364		# $vdoms -- virtual Pfam domains -- the equivalent of $neg_doms,
365		# but covering parts of a Pfam model that are not annotated. split
366		# domains have been joined, so simply check beginning and end of
367		# each domain (but must also check for bounded-ness)
368		# only add when 10% or more is missing and missing length > $min_nodom
369
370		if ($vdoms && scalar(@pf_domains)) {
371		my @vpf_domains;
372
373		my $curr_dom = $pf_domains[0];
374		my $length = $curr_dom->{length};
375
376		my $prev_dom={seq_end=>0, pfamA_acc=>''};
377		my $prev_dom_end = 0;
378		my $next_dom_start = $length+1;
379
380		for (my $dom_ix=0; $dom_ix < scalar(@pf_domains); $dom_ix++ ) {
381		$curr_dom = $pf_domains[$dom_ix];
382
383		my $pfamA = $curr_dom->{pfamA_acc};
384
385		# first, look left, is there a domain there (if there is,
386		# it should be updated right
387
388		# my $min_vdom = $curr_dom->{model_length} / 10;
389
390		if ($prev_dom->{pfamA_acc}) { # look for previous domain
391		$prev_dom_end = $prev_dom->{seq_end};
392		}
393
394		# there is a domain to the left, how much room is available?
395		my $left_dom_len = min($curr_dom->{seq_start}-$prev_dom_end-1, $curr_dom->{model_start}-1);
396		if ( $left_dom_len > $min_vdom) {
397		# there is room for a virtual domain
398		my %new_dom = (seq_start=> $curr_dom->{seq_start}-$left_dom_len,
399		seq_end => $curr_dom->{seq_start}-1,
400		info=>'@'.$curr_dom->{info},
401		model_length=>$curr_dom->{model_length},
402		model_end => $curr_dom->{model_start}-1,
403		model_start => $left_dom_len,
404		pfamA_acc=>$pfamA,
405		);
406		push @vpf_domains, \%new_dom;
407		}
408
409		# save the current domain
410		push @vpf_domains, $curr_dom;
411		$prev_dom = $curr_dom;
412
413		if ($dom_ix < $#pf_domains) { # there is a domain to the right
414		# first, give all the extra space to the first domain (no splitting)
415		$next_dom_start = $pf_domains[$dom_ix+1]->{seq_start};
416		}
417		else {
418		$next_dom_start = $length;
419		}
420
421		# is there room for a virtual domain right
422
423		my $right_dom_len = min($next_dom_start-$curr_dom->{seq_end}-1, # space available
424		$curr_dom->{model_length}-$curr_dom->{model_end} # space needed
425		);
426		if ( $right_dom_len > $min_vdom) {
427		my %new_dom = (seq_start=> $curr_dom->{seq_end}+1,
428		seq_end=> $curr_dom->{seq_end}+$right_dom_len,
429		info=>'@'.$pfamA,
430		model_length => $curr_dom->{model_length},
431		pfamA_acc=> $pfamA,
432		);
433		push @vpf_domains, \%new_dom;
434		$prev_dom = \%new_dom;
435		}
436		} # all done, check for last one
437
438		# $curr_dom=$pf_domains[-1];
439		# # my $min_vdom = $curr_dom->{model_length}/10;
440
441		# my $right_dom_len = min($length - $curr_dom->{seq_end}+1, # space available
442		# $curr_dom->{model_length}-$curr_dom->{model_end} # space needed
443		# );
444		# if ($right_dom_len > $min_vdom) {
445		# my %new_dom = (seq_start=> $curr_dom->{seq_end}+1,
446		# seq_end => $curr_dom->{seq_end}+$right_dom_len,
447		# info=>'@'.$curr_dom->{pfamA_acc},
448		# model_len=> $curr_dom->{model_len},
449		# pfamA_acc => $curr_dom->{pfamA_acc},
450		# model_start => $curr_dom->{model_end}+1,
451		# model_end => $curr_dom->{model_len},
452		# );
453
454		# push @vpf_domains, \%new_dom;
455		# }
456
457		# @vpf_domains has both old @pf_domains and new neg-domains
458		@pf_domains = @vpf_domains;
459		}
460
461		if ($neg_doms) {
462		my @npf_domains;
463		my $prev_dom={seq_end=>0};
464		for my $curr_dom ( @pf_domains) {
465		if ($curr_dom->{seq_start} - $prev_dom->{seq_end} > $min_nodom) {
466		my %new_dom = (seq_start=>$prev_dom->{seq_end}+1, seq_end => $curr_dom->{seq_start}-1, info=>'NODOM');
467		push @npf_domains, \%new_dom;
468		}
469		push @npf_domains, $curr_dom;
470		$prev_dom = $curr_dom;
471		}
472		if ($seq_length - $prev_dom->{seq_end} > $min_nodom) {
473		my %new_dom = (seq_start=>$prev_dom->{seq_end}+1, seq_end=>$seq_length, info=>'NODOM');
474		if ($new_dom{seq_end} > $new_dom{seq_start}) {push @npf_domains, \%new_dom;}
475		}
476
477		# @npf_domains has both old @pf_domains and new neg-domains
478		@pf_domains = @npf_domains;
479		}
480
481		# now make sure we have useful names: colors
482
483		for my $pf (@pf_domains) {
484		$pf->{info} = domain_name($pf->{info}, $pf->{auto_pfamA}, $pf->{pfamA_acc});
485		}
486
487		my @feats = ();
488		for my $d_ref (@pf_domains) {
489		if ($lav) {
490		push @feats, [$d_ref->{seq_start}, $d_ref->{seq_end}, $d_ref->{info}];
491		}
492		else {
493		push @feats, [$d_ref->{seq_start}, '-', $d_ref->{seq_end}, $d_ref->{info} ];
494		# push @feats, [$d_ref->{seq_end}, ']', '-', ""];
495		}
496
497		}
498
499		return \@feats;
500		}
501
502		sub min {
503		my ($arg1, $arg2) = @_;
504
505		return ($arg1 <= $arg2 ? $arg1 : $arg2);
506		}
507
508		sub max {
509		my ($arg1, $arg2) = @_;
510
511		return ($arg1 >= $arg2 ? $arg1 : $arg2);
512		}
513
514		# domain name takes a uniprot domain label, removes comments ( ;
515		# truncated) and numbers and returns a canonical form. Thus:
516		# Cortactin 6.
517		# Cortactin 7; truncated.
518		# becomes "Cortactin"
519		#
520
521		sub domain_name {
522
523		my ($value, $pfamA_acc) = @_;
524		my $is_virtual = 0;
525
526		if ($value =~ m/^@/) {
527		$is_virtual = 1;
528		$value =~ s/^@//;
529		}
530
531		# check for clan:
532		if ($no_clans) {
533		if (! defined($domains{$value})) {
534		$domain_clan{$value} = 0;
535		$domains{$value} = ++$domain_cnt;
536		push @domain_list, $pfamA_acc;
537		}
538		}
539		elsif (!defined($domain_clan{$value})) {
540		## only do this for new domains, old domains have known mappings
541
542		## ways to highlight the same domain:
543		# (1) for clans, substitute clan name for family name
544		# (2) for clans, use the same color for the same clan, but don't change the name
545		# (3) for clans, combine family name with clan name, but use colors based on clan
546
547		# check to see if it's a clan
548		$get_pfam_clan->execute($pfamA_acc);
549
550		my $pfam_clan_href=0;
551
552		if ($pfam_clan_href=$get_pfam_clan->fetchrow_hashref()) { # is a clan
553		my ($clan_id, $clan_acc) = @{$pfam_clan_href}{qw(clan_id clan_acc)};
554
555		# now check to see if we have seen this clan before (if so, do not increment $domain_cnt)
556		my $c_value = "C." . $clan_id;
557		if ($pf_acc) {$c_value = $clan_acc;}
558
559		$domain_clan{$value} = {clan_id => $clan_id,
560		clan_acc => $clan_acc};
561
562		if ($domains{$c_value}) {
563		$domain_clan{$value}->{domain_cnt} = $domains{$c_value};
564		$value = $c_value;
565		}
566		else {
567		$domain_clan{$value}->{domain_cnt} = ++ $domain_cnt;
568		$value = $c_value;
569		$domains{$value} = $domain_cnt;
570		push @domain_list, $pfamA_acc;
571		}
572		}
573		else { # not a clan
574		$domain_clan{$value} = 0;
575		$domains{$value} = ++$domain_cnt;
576		push @domain_list, $pfamA_acc;
577		}
578		}
579		elsif ($domain_clan{$value} && $domain_clan{$value}->{clan_acc}) {
580		if ($pf_acc) {$value = $domain_clan{$value}->{clan_acc};}
581		else { $value = "C." . $domain_clan{$value}->{clan_id}; }
582		}
583
584		if ($is_virtual) {
585		$domains{'@'.$value} = $domains{$value};
586		$value = '@'.$value;
587		}
588		return $value;
589		}
590
591		sub domain_num {
592		my ($value, $number) = @_;
593		if ($value =~ m/^@/) {
594		$value =~ s/^@/v/;
595		# $number = $number."v";
596		}
597		return ($value, $number);
598		}
599
600		__END__
601
602		=pod
603
604		=head1 NAME
605
606		ann_feats.pl
607
608		=head1 SYNOPSIS
609
610		ann_pfam_e.pl --neg-doms 'sp\|P09488\|GSTM1_NUMAN' \| accession.file
611
612		=head1 OPTIONS
613
614		-h short help
615		--help include description
616		--no-over : generate non-overlapping domains (equivalent to ann_pfam.pl)
617		--no-clans : do not use clans with multiple families from same clan
618		--neg-doms : report domains between annotated domains as NODOM
619		(also --neg, --neg_doms)
620		--min_nodom=10 : minimum length between domains for NODOM
621
622		--host, --user, --password, --port --db : info for mysql database
623
624		=head1 DESCRIPTION
625
626		C<ann_pfam_e.pl> extracts domain information from the pfam msyql
627		database. Currently, the program works with database sequence
628		descriptions in one of two formats:
629
630		Currently, the program works with database
631		sequence descriptions in several formats:
632
633		>gi\|1705556\|sp\|P54670.1\|CAF1_DICDI
634		>sp\|P09488\|GSTM1_HUMAN
635		>sp:CALM_HUMAN
636
637		C<ann_pfam_e.pl> uses the C<pfamA_reg_full_significant>, C<pfamseq>,
638		and C<pfamA> tables of the C<pfam> database to extract domain
639		information on a protein.
640
641		If the "--no-over" option is set, overlapping domains are selected and
642		edited to remove overlaps. For proteins with multiple overlapping
643		domains (domains overlap by more than 1/3 of the domain length),
644		C<auto_pfam_e.pl> selects the domain annotation with the best
645		C<domain_evalue_score>. When domains overlap by less than 1/3 of the
646		domain length, they are shortened to remove the overlap.
647
648		C<ann_pfam_e.pl> is designed to be used by the B<FASTA> programs with
649		the C<-V \!ann_pfam_e.pl> or C<-V "\!ann_pfam_e.pl --neg"> option.
650
651		=head1 AUTHOR
652
653		William R. Pearson, wrp@virginia.edu
654
655		=cut

+0

-782

~~scripts/ann_pfam28.pl~~ less more

0		#!/usr/bin/perl -w
1
2		################################################################
3		# copyright (c) 2014,2015 by William R. Pearson and The Rector &
4		# Visitors of the University of Virginia */
5		################################################################
6		# Licensed under the Apache License, Version 2.0 (the "License");
7		# you may not use this file except in compliance with the License.
8		# You may obtain a copy of the License at
9		#
10		# http://www.apache.org/licenses/LICENSE-2.0
11		#
12		# Unless required by applicable law or agreed to in writing,
13		# software distributed under this License is distributed on an "AS
14		# IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either
15		# express or implied. See the License for the specific language
16		# governing permissions and limitations under the License.
17		################################################################
18
19		# ann_pfam.pl gets an annotation file from fasta36 -V with a line of the form:
20
21		# gi\|62822551\|sp\|P00502\|GSTA1_RAT Glutathione S-transfer\n (at least from pir1.lseg)
22		#
23		# it must:
24		# (1) read in the line
25		# (2) parse it to get the up_acc
26		# (3) return the tab delimited features
27		#
28
29		# this version only annotates sequences known to Pfam:pfamseq:
30		# and only provides domain information
31
32		use strict;
33
34		use DBI;
35		use Getopt::Long;
36		use Pod::Usage;
37
38		use vars qw($host $db $port $user $pass);
39
40		my $hostname = `/bin/hostname`;
41
42		($host, $db, $port, $user, $pass) = ("wrpxdb.its.virginia.edu", "pfam28", 0, "web_user", "fasta_www");
43		#$host = 'xdb';
44		#$host = 'localhost';
45		#$db = 'RPD2_pfam28u';
46
47		my ($auto_reg,$rpd2_fams, $neg_doms, $vdoms, $lav, $no_doms, $no_clans, $pf_acc, $acc_comment, $bound_comment, $shelp, $help) =
48		(0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0,);
49		my ($no_over, $split_over, $over_fract) = (0, 0, 3.0);
50
51		my $color_sep_str = " :";
52		$color_sep_str = '~';
53
54		my ($min_nodom, $min_vdom) = (10,10);
55
56		GetOptions(
57		"host=s" => \$host,
58		"db=s" => \$db,
59		"user=s" => \$user,
60		"password=s" => \$pass,
61		"port=i" => \$port,
62		"lav" => \$lav,
63		"acc_comment" => \$acc_comment,
64		"bound_comment" => \$bound_comment,
65		"no-over" => \$no_over,
66		"no_over" => \$no_over,
67		"split-over" => \$split_over,
68		"split_over" => \$split_over,
69		"over_fract" => \$over_fract,
70		"over-fract" => \$over_fract,
71		"no-clans" => \$no_clans,
72		"no_clans" => \$no_clans,
73		"neg" => \$neg_doms,
74		"neg_doms" => \$neg_doms,
75		"neg-doms" => \$neg_doms,
76		"min_nodom=i" => \$min_nodom,
77		"vdoms" => \$vdoms,
78		"v_doms" => \$vdoms,
79		"pfacc" => \$pf_acc,
80		"RPD2" => \$rpd2_fams,
81		"auto_reg" => \$auto_reg,
82		"h\|?" => \$shelp,
83		"help" => \$help,
84		);
85
86		pod2usage(1) if $shelp;
87		pod2usage(exitstatus => 0, verbose => 2) if $help;
88		pod2usage(1) unless (@ARGV \|\| -p STDIN \|\| -f STDIN);
89
90		my $connect = "dbi:mysql(AutoCommit=>1,RaiseError=>1):database=$db";
91		$connect .= ";host=$host" if $host;
92		$connect .= ";port=$port" if $port;
93
94		my $dbh = DBI->connect($connect,
95		$user,
96		$pass
97		) or die $DBI::errstr;
98
99		my %annot_types = ();
100		my %domains = (NODOM=>0);
101		my %domain_clan = (NODOM => {clan_id => 'NODOM', clan_acc=>0, domain_cnt=>0});
102		my @domain_list = (0);
103		my $domain_cnt = 0;
104
105		my $pfamA_reg_full = 'pfamA_reg_full_significant';
106
107		my $get_annot_sub = \&get_pfam_annots;
108
109		my @pfam_fields = qw(seq_start seq_end model_start model_end model_length pfamA_acc pfamA_id auto_pfamA_reg_full domain_evalue_score as evalue length);
110
111		my $get_pfam_acc = $dbh->prepare(<<EOSQL);
112		SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_pfamA_reg_full, domain_evalue_score as evalue, length
113		FROM pfamseq
114		JOIN $pfamA_reg_full using(pfamseq_acc)
115		JOIN pfamA USING (pfamA_acc)
116		WHERE in_full = 1
117		AND pfamseq_acc=?
118		ORDER BY seq_start
119
120		EOSQL
121
122		my $get_pfam_refacc = $dbh->prepare(<<EOSQL);
123
124		SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_pfamA_reg_full, domain_evalue_score as evalue, length
125		FROM pfamseq
126		JOIN $pfamA_reg_full using(pfamseq_acc)
127		JOIN pfamA USING (pfamA_acc)
128		JOIN seqdb_demo2.annot as sa1 on(sa1.acc=pfamseq_acc and sa1.db='sp')
129		JOIN seqdb_demo2.annot as sa2 using(prot_id)
130		WHERE in_full = 1
131		AND sa2.acc=?
132		AND sa2.db='ref'
133		ORDER BY seq_start
134
135		EOSQL
136
137		my $get_annots_sql = $get_pfam_acc;
138
139		my $get_pfam_id = $dbh->prepare(<<EOSQL);
140
141		SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_pfamA_reg_full, domain_evalue_score as evalue, length
142		FROM pfamseq
143		JOIN $pfamA_reg_full using(pfamseq_acc)
144		JOIN pfamA USING (pfamA_acc)
145		WHERE in_full=1
146		AND pfamseq_id=?
147		ORDER BY seq_start
148
149		EOSQL
150
151		my $get_pfam_clan = $dbh->prepare(<<EOSQL);
152
153		SELECT clan_acc, clan_id
154		FROM clan
155		JOIN clan_membership using(clan_acc)
156		WHERE pfamA_acc=?
157
158		EOSQL
159
160		my $get_rpd2_clans = $dbh->prepare(<<EOSQL);
161
162		SELECT auto_pfamA, clan
163		FROM ljm_db.RPD2_final_fams
164		WHERE clan is not NULL
165
166		EOSQL
167
168		# -- LEFT JOIN clan_membership USING (auto_pfamA)
169		# -- LEFT JOIN clans using(auto_clan)
170
171		my ($tmp, $gi, $sdb, $acc, $id, $use_acc);
172
173		# get the query
174		my ($query, $seq_len) = @ARGV;
175		$seq_len = 0 unless defined($seq_len);
176
177		$query =~ s/^>// if ($query);
178
179		my @annots = ();
180
181		my %rpd2_clan_fams = ();
182
183		if ($rpd2_fams) {
184		$get_rpd2_clans->execute();
185		my ($auto_pfam, $auto_clan);
186		while (($auto_pfam, $auto_clan)=$get_rpd2_clans->fetchrow_array()) {
187		$rpd2_clan_fams{$auto_pfam} = $auto_clan;
188		}
189		}
190
191		#if it's a file I can open, read and parse it
192		unless ($query && $query =~ m/[\\|:]/) {
193
194		while (my $a_line = <>) {
195		$a_line =~ s/^>//;
196		chomp $a_line;
197		push @annots, show_annots($a_line, $get_annot_sub);
198		}
199		}
200		else {
201		push @annots, show_annots("$query $seq_len", $get_annot_sub);
202		}
203
204		for my $seq_annot (@annots) {
205		print ">",$seq_annot->{seq_info},"\n";
206		for my $annot (@{$seq_annot->{list}}) {
207		if (!$lav && defined($domains{$annot->[-1]})) {
208		my ($a_name, $a_num) = domain_num($annot->[-1],$domains{$annot->[-1]});
209		$annot->[-1] = $a_name;
210		my $tmp_a_num = $a_num;
211		$tmp_a_num =~ s/v$//;
212		if ($acc_comment) {
213		$annot->[-1] .= "{$domain_list[$tmp_a_num]}";
214		}
215		if ($bound_comment) {
216		$annot->[-1] .= $color_sep_str.$annot->[0].":".$annot->[2];
217		}
218		$annot->[-1] .= $color_sep_str.$a_num;
219		}
220		print join("\t",@$annot),"\n";
221		}
222		}
223
224		exit(0);
225
226		sub show_annots {
227		my ($query_len, $get_annot_sub) = @_;
228
229		my ($annot_line, $seq_len) = split(/\t/,$query_len);
230
231		my $pfamA_acc;
232
233		my %annot_data = (seq_info=>$annot_line);
234
235		$use_acc = 1;
236		$get_annots_sql = $get_pfam_acc;
237
238		if ($annot_line =~ m/^pf\d+\\|/) {
239		($sdb, $gi, $pfamA_acc, $acc, $id) = split(/\\|/,$annot_line);
240		# $dbh->do("use RPD2_pfam");
241		}
242		elsif ($annot_line =~ m/^gi\\|/) {
243		($tmp, $gi, $sdb, $acc, $id) = split(/\\|/,$annot_line);
244		if ($sdb =~ m/ref/) {
245		$get_annots_sql = $get_pfam_refacc;
246		}
247		}
248		elsif ($annot_line =~ m/^(sp\|tr)\\|/) {
249		($sdb, $acc, $id) = split(/\\|/,$annot_line);
250		}
251		elsif ($annot_line =~ m/^ref\\|/) {
252		($sdb, $acc) = split(/\\|/,$annot_line);
253		$get_annots_sql = $get_pfam_refacc;
254		}
255		elsif ($annot_line =~ m/^(SP\|TR):/i) {
256		($sdb, $id) = split(/:/,$annot_line);
257		$use_acc = 0;
258		}
259		elsif ($annot_line !~ m/\\|/) { # new NCBI swissprot format
260		$use_acc =1;
261		$sdb = 'sp';
262		($acc) = split(/\s+/,$annot_line);
263		}
264
265		# remove version number
266		unless ($use_acc) {
267		$get_annots_sql = $get_pfam_id;
268		$get_annots_sql->execute($id);
269		} else {
270		unless ($acc) {
271		warn "missing acc in $annot_line";
272		next;
273		} else {
274		$acc =~ s/\.\d+$//;
275		$get_annots_sql->execute($acc);
276		}
277		}
278
279		$annot_data{list} = $get_annot_sub->($get_annots_sql, $seq_len);
280
281		return \%annot_data;
282		}
283
284		sub get_pfam_annots {
285		my ($get_annots, $seq_length) = @_;
286
287		$seq_length = 0 unless $seq_length;
288
289		my @pf_domains = ();
290
291		# get the list of domains, sorted by start
292
293		# $row_href has: seq_start, seq_end, model_start, model_end, model_length,
294		# pfamA_acc, pfamA_id, auto_pfamA_reg_full,
295		# domain_evalue_score as evalue, length
296
297		while ( my $row_href = $get_annots->fetchrow_hashref()) {
298		if ($auto_reg) {
299		$row_href->{info} = $row_href->{auto_pfamA_reg_full};
300		} elsif ($pf_acc) {
301		$row_href->{info} = $row_href->{pfamA_acc};
302		} else {
303		$row_href->{info} = $row_href->{pfamA_id};
304		}
305
306		if ($row_href && $row_href->{length} > $seq_length && $seq_length == 0) {
307		$seq_length = $row_href->{length};
308		}
309
310		next if ($row_href->{seq_start} >= $seq_length);
311		if ($row_href->{seq_end} > $seq_length) {
312		$row_href->{seq_end} = $seq_length;
313		}
314
315		push @pf_domains, $row_href
316		}
317
318		# before checking for domain overlap, check for "split-domains"
319		# (self-unbound) by looking for runs of the same domain that are
320		# ordered by model_start
321
322		if (scalar(@pf_domains) > 1) {
323		my @j_domains; #joined domains
324		my @tmp_domains = @pf_domains;
325
326		my $prev_dom = shift(@tmp_domains);
327
328		for my $curr_dom (@tmp_domains) {
329		# to join domains:
330		# (1) the domains must be in order by model_start/end coordinates
331		# (3) joining the domains cannot make the total combination too long
332
333		# check for model and sequence consistency
334		if (($prev_dom->{pfamA_acc} eq $curr_dom->{pfamA_acc}) # same family
335		&& $prev_dom->{model_start} < $curr_dom->{model_start} # model check
336		&& $prev_dom->{model_end} < $curr_dom->{model_end}
337
338		&& ($curr_dom->{model_start} > $prev_dom->{model_end} * 0.80 # limit overlap
339		\|\| $curr_dom->{model_start} < $prev_dom->{model_end} * 1.25)
340		&& ((($curr_dom->{model_end} - $curr_dom->{model_start}+1)/$curr_dom->{model_length} +
341		($prev_dom->{model_end} - $prev_dom->{model_start}+1)/$prev_dom->{model_length}) < 1.33)
342		) { # join them by updating $prev_dom
343		$prev_dom->{seq_end} = $curr_dom->{seq_end};
344		$prev_dom->{model_end} = $curr_dom->{model_end};
345		$prev_dom->{auto_pfamA_reg_full} = $prev_dom->{auto_pfamA_reg_full} . ";". $curr_dom->{auto_pfamA_reg_full};
346		$prev_dom->{evalue} = ($prev_dom->{evalue} < $curr_dom->{evalue} ? $prev_dom->{evalue} : $curr_dom->{evalue});
347		} else {
348		push @j_domains, $prev_dom;
349		$prev_dom = $curr_dom;
350		}
351		}
352		push @j_domains, $prev_dom;
353		@pf_domains = @j_domains;
354
355
356		if ($no_over) { # for either $no_over or $split_over, check for overlapping domains and edit/split them
357
358		my @tmp_domains = @pf_domains; # allow shifts from copy of @pf_domains
359		my @save_domains = (); # where the new domains go
360
361		my $prev_dom = shift @tmp_domains;
362
363		while (my $curr_dom = shift @tmp_domains) {
364
365		my @overlap_domains = ($prev_dom);
366
367		my $diff = $prev_dom->{seq_end} - $curr_dom->{seq_start};
368
369		my ($prev_len, $cur_len) = ($prev_dom->{seq_end}-$prev_dom->{seq_start}+1,
370		$curr_dom->{seq_end}-$curr_dom->{seq_start}+1);
371
372		my $inclusion = ((($curr_dom->{seq_start} >= $prev_dom->{seq_start}) # start is right && end is left
373		&& ($curr_dom->{seq_end} <= $prev_dom->{seq_end})) \|\| # -- curr inside prev
374		(($curr_dom->{seq_start} <= $prev_dom->{seq_start}) # start is left && end is right
375		&& ($curr_dom->{seq_end} >= $prev_dom->{seq_end}))); # -- prev is inside curr
376
377		my $longer_len = ($prev_len > $cur_len) ? $prev_len : $cur_len;
378
379		# check for overlap > domain_length/$over_fract
380		while ($inclusion \|\| ($diff > 0 && $diff > $longer_len/$over_fract)) {
381		push @overlap_domains, $curr_dom;
382		$curr_dom = shift @tmp_domains;
383		last unless $curr_dom;
384		$diff = $prev_dom->{seq_end} - $curr_dom->{seq_start};
385		($prev_len, $cur_len) = ($prev_dom->{seq_end}-$prev_dom->{seq_start}+1, $curr_dom->{seq_end}-$curr_dom->{seq_start}+1);
386		$longer_len = ($prev_len > $cur_len) ? $prev_len : $cur_len;
387		$inclusion = ((($curr_dom->{seq_start} >= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} <= $prev_dom->{seq_end})) \|\|
388		(($curr_dom->{seq_start} <= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} >= $prev_dom->{seq_end})));
389		}
390
391		# check for overlapping domains; >1 because $prev_dom is always there
392		if (scalar(@overlap_domains) > 1 ) {
393		# if $rpd2_fams, check for a chosen one
394
395		for my $dom ( @overlap_domains) {
396		$dom->{evalue} = 1.0 unless defined($dom->{evalue});
397		}
398
399		@overlap_domains = sort { $a->{evalue} <=> $b->{evalue} } @overlap_domains;
400		$prev_dom = $overlap_domains[0];
401		}
402
403		# $prev_dom should be the best of the overlaps, and we are no longer overlapping > dom_length/3
404		push @save_domains, $prev_dom;
405		$prev_dom = $curr_dom;
406		}
407
408		if ($prev_dom) {
409		push @save_domains, $prev_dom;
410		}
411
412		@pf_domains = @save_domains;
413
414		# now check for smaller overlaps
415		for (my $i=1; $i < scalar(@pf_domains); $i++) {
416		if ($pf_domains[$i-1]->{seq_end} >= $pf_domains[$i]->{seq_start}) {
417		my $overlap = $pf_domains[$i-1]->{seq_end} - $pf_domains[$i]->{seq_start};
418		$pf_domains[$i-1]->{seq_end} -= int($overlap/2);
419		$pf_domains[$i]->{seq_start} = $pf_domains[$i-1]->{seq_end}+1;
420		}
421		}
422		}
423		elsif ($split_over) { # here, everything that overlaps by > $min_vdom should be split into a separate domain
424		my @save_domains = (); # where the new domains go
425
426		# check to see if one domain is included (or overlapping) more
427		# than xx% of the other. If so, pick the longer one
428
429		my ($prev_dom, $curr_dom) = ($pf_domains[0],0) ;
430		for (my $i=1; $i < scalar(@pf_domains); $i++) {
431		$curr_dom = $pf_domains[$i];
432
433		my ($prev_len, $cur_len) = ($prev_dom->{seq_end}-$prev_dom->{seq_start}+1, $curr_dom->{seq_end}-$curr_dom->{seq_start}+1);
434		my $longer_len = ($prev_len > $cur_len) ? $prev_len : $cur_len;
435
436		if (($curr_dom->{seq_start} >= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} <= $prev_dom->{seq_end})
437		&& $cur_len / $prev_len > 0.80) {
438		# $prev_dom stays the same, $curr_dom deleted
439		next;
440		}
441		elsif (($curr_dom->{seq_start} <= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} >= $prev_dom->{seq_end})
442		&& $prev_len / $cur_len > 0.80) {
443		$prev_dom = $curr_dom; # this should delete $prev_dom
444		next;
445		}
446
447		if ($prev_dom->{seq_end} >= $curr_dom->{seq_start} + $min_vdom) {
448		my ($l_seq_end, $r_seq_start) = ($curr_dom->{seq_start}-1, $prev_dom->{seq_end}+1);
449
450		$prev_dom->{seq_end} = $l_seq_end;
451		push @save_domains, $prev_dom;
452		my $new_dom = {seq_start => $l_seq_end+1, seq_end=>$r_seq_start-1,
453		model_length => -1,
454		pfamA_acc=>$prev_dom->{pfamA_acc}."/".$curr_dom->{pfamA_acc},
455		pfamA_id=>$prev_dom->{pfamA_id}."/".$curr_dom->{pfamA_id},
456		};
457
458		if ($pf_acc) {
459		$new_dom->{info} = $new_dom->{pfamA_acc};
460		}
461		else {
462		$new_dom->{info} = $new_dom->{pfamA_id};
463		}
464
465		push @save_domains, $new_dom;
466		$curr_dom->{seq_start} = $r_seq_start;
467		$prev_dom = $curr_dom;
468		}
469		else {
470		push @save_domains, $prev_dom;
471		$prev_dom = $curr_dom;
472		}
473		}
474		push @save_domains, $prev_dom;
475		@pf_domains = @save_domains;
476		}
477		}
478
479		# $vdoms -- virtual Pfam domains -- the equivalent of $neg_doms,
480		# but covering parts of a Pfam model that are not annotated. split
481		# domains have been joined, so simply check beginning and end of
482		# each domain (but must also check for bounded-ness)
483		# only add when 10% or more is missing and missing length > $min_nodom
484
485		if ($vdoms && scalar(@pf_domains)) {
486		my @vpf_domains;
487
488		my $curr_dom = $pf_domains[0];
489		my $length = $curr_dom->{length};
490
491		my $prev_dom={seq_end=>0, pfamA_acc=>''};
492		my $prev_dom_end = 0;
493		my $next_dom_start = $length+1;
494
495		for (my $dom_ix=0; $dom_ix < scalar(@pf_domains); $dom_ix++ ) {
496		$curr_dom = $pf_domains[$dom_ix];
497
498		my $pfamA = $curr_dom->{pfamA_acc};
499
500		# first, look left, is there a domain there (if there is,
501		# it should be updated right
502
503		# my $min_vdom = $curr_dom->{model_length} / 10;
504
505		if ($curr_dom->{model_length} < $min_vdom) {
506		push @vpf_domains, $curr_dom;
507		next;
508		}
509		if ($prev_dom->{pfamA_acc}) { # look for previous domain
510		$prev_dom_end = $prev_dom->{seq_end};
511		}
512
513		# there is a domain to the left, how much room is available?
514		my $left_dom_len = min($curr_dom->{seq_start}-$prev_dom_end-1, $curr_dom->{model_start}-1);
515		if ( $left_dom_len > $min_vdom) {
516		# there is room for a virtual domain
517		my %new_dom = (seq_start=> $curr_dom->{seq_start}-$left_dom_len,
518		seq_end => $curr_dom->{seq_start}-1,
519		info=>'@'.$curr_dom->{info},
520		model_length=>$curr_dom->{model_length},
521		model_end => $curr_dom->{model_start}-1,
522		model_start => $left_dom_len,
523		pfamA_acc=>$pfamA,
524		);
525		push @vpf_domains, \%new_dom;
526		}
527
528		# save the current domain
529		push @vpf_domains, $curr_dom;
530		$prev_dom = $curr_dom;
531
532		if ($dom_ix < $#pf_domains) { # there is a domain to the right
533		# first, give all the extra space to the first domain (no splitting)
534		$next_dom_start = $pf_domains[$dom_ix+1]->{seq_start};
535		}
536		else {
537		$next_dom_start = $length;
538		}
539
540		# is there room for a virtual domain right
541
542		my $right_dom_len = min($next_dom_start-$curr_dom->{seq_end}-1, # space available
543		$curr_dom->{model_length}-$curr_dom->{model_end} # space needed
544		);
545		if ( $right_dom_len > $min_vdom) {
546		my %new_dom = (seq_start=> $curr_dom->{seq_end}+1,
547		seq_end=> $curr_dom->{seq_end}+$right_dom_len,
548		info=>'@'.$curr_dom->{info},
549		model_length => $curr_dom->{model_length},
550		pfamA_acc=> $pfamA,
551		);
552		push @vpf_domains, \%new_dom;
553		$prev_dom = \%new_dom;
554		}
555		} # all done, check for last one
556
557		# $curr_dom=$pf_domains[-1];
558		# # my $min_vdom = $curr_dom->{model_length}/10;
559
560		# my $right_dom_len = min($length - $curr_dom->{seq_end}+1, # space available
561		# $curr_dom->{model_length}-$curr_dom->{model_end} # space needed
562		# );
563		# if ($right_dom_len > $min_vdom) {
564		# my %new_dom = (seq_start=> $curr_dom->{seq_end}+1,
565		# seq_end => $curr_dom->{seq_end}+$right_dom_len,
566		# info=>'@'.$curr_dom->{pfamA_acc},
567		# model_len=> $curr_dom->{model_len},
568		# pfamA_acc => $curr_dom->{pfamA_acc},
569		# model_start => $curr_dom->{model_end}+1,
570		# model_end => $curr_dom->{model_len},
571		# );
572
573		# push @vpf_domains, \%new_dom;
574		# }
575
576		# @vpf_domains has both old @pf_domains and new neg-domains
577		@pf_domains = @vpf_domains;
578		}
579
580		if ($neg_doms) {
581		my @npf_domains;
582		my $prev_dom={seq_end=>0};
583		for my $curr_dom ( @pf_domains) {
584		if ($curr_dom->{seq_start} - $prev_dom->{seq_end} > $min_nodom) {
585		my %new_dom = (seq_start=>$prev_dom->{seq_end}+1, seq_end => $curr_dom->{seq_start}-1, info=>'NODOM');
586		push @npf_domains, \%new_dom;
587		}
588		push @npf_domains, $curr_dom;
589		$prev_dom = $curr_dom;
590		}
591		if ($seq_length - $prev_dom->{seq_end} > $min_nodom) {
592		my %new_dom = (seq_start=>$prev_dom->{seq_end}+1, seq_end=>$seq_length, info=>'NODOM');
593		if ($new_dom{seq_end} > $new_dom{seq_start}) {
594		push @npf_domains, \%new_dom;
595		}
596		}
597
598		# @npf_domains has both old @pf_domains and new neg-domains
599		@pf_domains = @npf_domains;
600		}
601
602		# now make sure we have useful names: colors
603
604		for my $pf (@pf_domains) {
605		$pf->{info} = domain_name($pf->{info}, $pf->{pfamA_acc});
606		}
607
608		my @feats = ();
609		for my $d_ref (@pf_domains) {
610		if ($lav) {
611		push @feats, [$d_ref->{seq_start}, $d_ref->{seq_end}, $d_ref->{info}];
612		} else {
613		push @feats, [$d_ref->{seq_start}, '-', $d_ref->{seq_end}, $d_ref->{info} ];
614		# push @feats, [$d_ref->{seq_end}, ']', '-', ""];
615		}
616
617		}
618
619		return \@feats;
620		}
621
622		sub min {
623		my ($arg1, $arg2) = @_;
624
625		return ($arg1 <= $arg2 ? $arg1 : $arg2);
626		}
627
628		sub max {
629		my ($arg1, $arg2) = @_;
630
631		return ($arg1 >= $arg2 ? $arg1 : $arg2);
632		}
633
634		# domain name takes a uniprot domain label, removes comments ( ;
635		# truncated) and numbers and returns a canonical form. Thus:
636		# Cortactin 6.
637		# Cortactin 7; truncated.
638		# becomes "Cortactin"
639		#
640
641		sub domain_name {
642
643		my ($value, $pfamA_acc) = @_;
644		my $is_virtual = 0;
645
646		if ($value =~ m/^@/) {
647		$is_virtual = 1;
648		$value =~ s/^@//;
649		}
650
651		# check for clan:
652		if ($no_clans) {
653		if (! defined($domains{$value})) {
654		$domain_clan{$value} = 0;
655		$domains{$value} = ++$domain_cnt;
656		push @domain_list, $pfamA_acc;
657		}
658		}
659		elsif (!defined($domain_clan{$value})) {
660		## only do this for new domains, old domains have known mappings
661
662		## ways to highlight the same domain:
663		# (1) for clans, substitute clan name for family name
664		# (2) for clans, use the same color for the same clan, but don't change the name
665		# (3) for clans, combine family name with clan name, but use colors based on clan
666
667		# check to see if it's a clan
668		$get_pfam_clan->execute($pfamA_acc);
669
670		my $pfam_clan_href=0;
671
672		if ($pfam_clan_href=$get_pfam_clan->fetchrow_hashref()) { # is a clan
673		my ($clan_id, $clan_acc) = @{$pfam_clan_href}{qw(clan_id clan_acc)};
674
675		# now check to see if we have seen this clan before (if so, do not increment $domain_cnt)
676		my $c_value = "C." . $clan_id;
677		if ($pf_acc) {$c_value = $clan_acc;}
678
679		$domain_clan{$value} = {clan_id => $clan_id,
680		clan_acc => $clan_acc};
681
682		if ($domains{$c_value}) {
683		$domain_clan{$value}->{domain_cnt} = $domains{$c_value};
684		$value = $c_value;
685		}
686		else {
687		$domain_clan{$value}->{domain_cnt} = ++ $domain_cnt;
688		$value = $c_value;
689		$domains{$value} = $domain_cnt;
690		push @domain_list, $pfamA_acc;
691		}
692		}
693		else { # not a clan
694		$domain_clan{$value} = 0;
695		$domains{$value} = ++$domain_cnt;
696		push @domain_list, $pfamA_acc;
697		}
698		}
699		elsif ($domain_clan{$value} && $domain_clan{$value}->{clan_acc}) {
700		if ($pf_acc) {$value = $domain_clan{$value}->{clan_acc};}
701		else { $value = "C." . $domain_clan{$value}->{clan_id}; }
702		}
703
704		if ($is_virtual) {
705		$domains{'@'.$value} = $domains{$value};
706		$value = '@'.$value;
707		}
708		return $value;
709		}
710
711		sub domain_num {
712		my ($value, $number) = @_;
713		if ($value =~ m/^@/) {
714		$value =~ s/^@/v/;
715		$number = $number."v";
716		}
717		return ($value, $number);
718		}
719
720
721		__END__
722
723		=pod
724
725		=head1 NAME
726
727		ann_pfam28.pl
728
729		=head1 SYNOPSIS
730
731		ann_pfam28.pl --neg-doms --vdoms 'sp\|P09488\|GSTM1_NUMAN' \| accession.file
732
733		=head1 OPTIONS
734
735		-h short help
736		--help include description
737		--no-over : generate non-overlapping domains (equivalent to ann_pfam.pl)
738		--split-over : overlaps of two domains generate a new hybrid domain
739		--no-clans : do not use clans with multiple families from same clan
740		--neg-doms : report domains between annotated domains as NODOM
741		(also --neg, --neg_doms)
742		--vdoms : produce "virtual domains" using model_start,
743		model_end for partial pfam domains
744		--min_nodom=10 : minimum length between domains for NODOM
745
746		--host, --user, --password, --port --db : info for mysql database
747
748		=head1 DESCRIPTION
749
750		C<ann_pfam28.pl> extracts domain information from the pfam msyql
751		database. Currently, the program works with database
752		sequence descriptions in several formats:
753
754		>gi\|1705556\|sp\|P54670.1\|CAF1_DICDI
755		>sp\|P09488\|GSTM1_HUMAN
756		>sp:CALM_HUMAN
757
758		C<ann_pfam28.pl> uses the C<pfamA_reg_full_significant>, C<pfamseq>,
759		and C<pfamA> tables of the C<pfam> database to extract domain
760		information on a protein.
761
762		If the C<--no-over> option is set, overlapping domains are selected and
763		edited to remove overlaps. For proteins with multiple overlapping
764		domains (domains overlap by more than 1/3 of the domain length),
765		C<auto_pfam28.pl> selects the domain annotation with the best
766		C<domain_evalue_score>. When domains overlap by less than 1/3 of the
767		domain length, they are shortened to remove the overlap.
768
769		If the C<--split-over> option is set, if two domains overlap, the
770		overlapping region is split out of the domains and labeled as a new,
771		virtual-lie, domain. If one domain is internal to another and spans
772		80% of the domain, the shorter domain is removed.
773
774		C<ann_pfam28.pl> is designed to be used by the B<FASTA> programs with
775		the C<-V \!ann_pfam28.pl> or C<-V "\!ann_pfam28.pl --neg"> option.
776
777		=head1 AUTHOR
778
779		William R. Pearson, wrp@virginia.edu
780
781		=cut

+0

-859

~~scripts/ann_pfam30.pl~~ less more

0		#!/usr/bin/perl -w
1
2		################################################################
3		# copyright (c) 2014,2015 by William R. Pearson and The Rector &
4		# Visitors of the University of Virginia */
5		################################################################
6		# Licensed under the Apache License, Version 2.0 (the "License");
7		# you may not use this file except in compliance with the License.
8		# You may obtain a copy of the License at
9		#
10		# http://www.apache.org/licenses/LICENSE-2.0
11		#
12		# Unless required by applicable law or agreed to in writing,
13		# software distributed under this License is distributed on an "AS
14		# IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either
15		# express or implied. See the License for the specific language
16		# governing permissions and limitations under the License.
17		################################################################
18
19		# ann_pfam.pl gets an annotation file from fasta36 -V with a line of the form:
20
21		# gi\|62822551\|sp\|P00502\|GSTA1_RAT Glutathione S-transfer\n (at least from pir1.lseg)
22		#
23		# it must:
24		# (1) read in the line
25		# (2) parse it to get the up_acc
26		# (3) return the tab delimited features
27		#
28
29		# this is the first version that works with the new Pfam strategy of
30		# separating Uniprot reference sequences from the rest of uniprot. as
31		# a result, it is possible that 2 SQL queries will be required, one to
32		# pfamA_reg_full_significant and a second to uniprot_reg_full.
33
34		# modified 15-Jan-2017 to reduce the number of calls when the same
35		# accession is present multiple times. Accessions are saved in a hash
36		# than ensures uniqueness. (Could also speed things up by creating temporary table.)
37		#
38
39
40		use strict;
41
42		use DBI;
43		use Getopt::Long;
44		use Pod::Usage;
45
46		use vars qw($host $db $port $user $pass);
47
48		my $hostname = `/bin/hostname`;
49
50		($host, $db, $port, $user, $pass) = ("wrpxdb.its.virginia.edu", "pfam31", 0, "web_user", "fasta_www");
51		#$host = 'xdb';
52		#$host = 'localhost';
53		#$db = 'RPD2_pfam28u';
54
55		my ($auto_reg,$rpd2_fams, $neg_doms, $vdoms, $lav, $no_doms, $no_clans, $pf_acc, $acc_comment, $bound_comment, $shelp, $help) =
56		(0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0,);
57		my ($no_over, $split_over, $over_fract) = (0, 0, 3.0);
58
59		my ($color_sep_str, $show_color) = (" :",1);
60		$color_sep_str = '~';
61
62		my ($min_nodom, $min_vdom) = (10,10);
63
64		GetOptions(
65		"host=s" => \$host,
66		"db=s" => \$db,
67		"user=s" => \$user,
68		"password=s" => \$pass,
69		"port=i" => \$port,
70		"lav" => \$lav,
71		"acc_comment" => \$acc_comment,
72		"bound_comment" => \$bound_comment,
73		"color!" => \$show_color,
74		"no-over" => \$no_over,
75		"no_over" => \$no_over,
76		"split-over" => \$split_over,
77		"split_over" => \$split_over,
78		"over_fract" => \$over_fract,
79		"over-fract" => \$over_fract,
80		"no-clans" => \$no_clans,
81		"no_clans" => \$no_clans,
82		"neg" => \$neg_doms,
83		"neg_doms" => \$neg_doms,
84		"neg-doms" => \$neg_doms,
85		"min_nodom=i" => \$min_nodom,
86		"vdoms" => \$vdoms,
87		"v_doms" => \$vdoms,
88		"pfacc" => \$pf_acc,
89		"RPD2" => \$rpd2_fams,
90		"auto_reg" => \$auto_reg,
91		"h\|?" => \$shelp,
92		"help" => \$help,
93		);
94
95		pod2usage(1) if $shelp;
96		pod2usage(exitstatus => 0, verbose => 2) if $help;
97		pod2usage(1) unless (@ARGV \|\| -p STDIN \|\| -f STDIN);
98
99		my $connect = "dbi:mysql(AutoCommit=>1,RaiseError=>1):database=$db";
100		$connect .= ";host=$host" if $host;
101		$connect .= ";port=$port" if $port;
102
103		my $dbh = DBI->connect($connect,
104		$user,
105		$pass
106		) or die $DBI::errstr;
107
108		my %annot_types = ();
109		my %domains = (NODOM=>0);
110		my %domain_clan = (NODOM => {clan_id => 'NODOM', clan_acc=>0, domain_cnt=>0});
111		my @domain_list = (0);
112		my $domain_cnt = 0;
113
114		my $pfamA_reg_full = 'pfamA_reg_full_significant';
115		my $uniprot_reg_full = 'uniprot_reg_full';
116
117		my $get_annot_sub = \&get_pfam_annots;
118
119		my @pfam_fields = qw(seq_start seq_end model_start model_end model_length pfamA_acc pfamA_id auto_pfamA_reg_full domain_evalue_score as evalue length);
120		my @upfam_fields = qw(seq_start seq_end model_start model_end model_length pfamA_acc pfamA_id auto_uniprot_reg_full domain_evalue_score as evalue length);
121
122		my $get_pfam_acc = $dbh->prepare(<<EOSQL);
123		SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_pfamA_reg_full, domain_evalue_score as evalue, length
124		FROM pfamseq
125		JOIN pfamA_reg_full_significant using(pfamseq_acc)
126		JOIN pfamA USING (pfamA_acc)
127		WHERE in_full = 1
128		AND pfamseq_acc=?
129		ORDER BY seq_start
130
131		EOSQL
132
133		my $get_upfam_acc = $dbh->prepare(<<EOSQL);
134		SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_uniprot_reg_full as auto_pfamA_reg_full, domain_evalue_score as evalue, length
135		FROM uniprot
136		JOIN uniprot_reg_full using(uniprot_acc)
137		JOIN pfamA USING (pfamA_acc)
138		WHERE in_full = 1
139		AND uniprot_acc=?
140		ORDER BY seq_start
141
142		EOSQL
143
144		my $get_pfam_refacc = $dbh->prepare(<<EOSQL);
145		SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_pfamA_reg_full, domain_evalue_score as evalue, length
146		FROM $pfamA_reg_full
147		JOIN pfamseq using(pfamseq_acc)
148		JOIN pfamA USING (pfamA_acc)
149		JOIN uniprot.refseq2up as rf2up on(rf2up.up_acc=pfamseq_acc)
150		WHERE in_full = 1
151		AND rf2up.refseq_acc=?
152		ORDER BY seq_start
153
154		EOSQL
155
156		my $get_upfam_refacc = $dbh->prepare(<<EOSQL);
157		SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_uniprot_reg_full as auto_pfamA_reg_full, domain_evalue_score as evalue, length
158		FROM uniprot
159		JOIN uniprot_reg_full using(uniprot_acc)
160		JOIN pfamA USING (pfamA_acc)
161		JOIN uniprot.refseq2up as rf2up on(rf2up.up_acc=uniprot_acc)
162		WHERE in_full = 1
163		AND refseq_acc=?
164		ORDER BY seq_start
165
166		EOSQL
167
168		my $get_annots_sql = $get_pfam_acc;
169
170		my $get_pfam_id = $dbh->prepare(<<EOSQL);
171		SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_pfamA_reg_full, domain_evalue_score as evalue, length
172		FROM pfamseq
173		JOIN $pfamA_reg_full using(pfamseq_acc)
174		JOIN pfamA USING (pfamA_acc)
175		WHERE in_full=1
176		AND pfamseq_id=?
177		ORDER BY seq_start
178
179		EOSQL
180
181		my $get_upfam_id = $dbh->prepare(<<EOSQL);
182		SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_uniprot_reg_full as auto_pfamA_reg_full, domain_evalue_score as evalue, length
183		FROM uniprot
184		JOIN uniprot_reg_full using(pfamseq_acc)
185		JOIN pfamA USING (pfamA_acc)
186		WHERE in_full=1
187		AND uniprot_id=?
188		ORDER BY seq_start
189
190		EOSQL
191
192		my $get_pfam_clan = $dbh->prepare(<<EOSQL);
193
194		SELECT clan_acc, clan_id
195		FROM clan
196		JOIN clan_membership using(clan_acc)
197		WHERE pfamA_acc=?
198
199		EOSQL
200
201		my $get_rpd2_clans = $dbh->prepare(<<EOSQL);
202
203		SELECT auto_pfamA, clan
204		FROM ljm_db.RPD2_final_fams
205		WHERE clan is not NULL
206
207		EOSQL
208
209		# -- LEFT JOIN clan_membership USING (auto_pfamA)
210		# -- LEFT JOIN clans using(auto_clan)
211
212		my ($tmp, $gi, $sdb, $acc, $id, $use_acc);
213
214		# get the query
215		my ($query, $seq_len) = @ARGV;
216		$seq_len = 0 unless defined($seq_len);
217
218		$query =~ s/^>// if ($query);
219
220		my @annots = ();
221		my %annot_set = ();
222
223		my %rpd2_clan_fams = ();
224
225		if ($rpd2_fams) {
226		$get_rpd2_clans->execute();
227		my ($auto_pfam, $auto_clan);
228		while (($auto_pfam, $auto_clan)=$get_rpd2_clans->fetchrow_array()) {
229		$rpd2_clan_fams{$auto_pfam} = $auto_clan;
230		}
231		}
232
233		#if it's a file I can open, read and parse it
234		unless ($query && ($query =~ m/[\\|:]/ \|\|
235		$query =~ m/^[NX]P_/ \|\|
236		$query =~ m/^[OPQ][0-9][A-Z0-9]{3}[0-9]\|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}\s/)) {
237
238		while (my $a_line = <>) {
239		$a_line =~ s/^>//;
240		chomp $a_line;
241		push @annots, show_annots($a_line, $get_annot_sub);
242		}
243		}
244		else {
245		push @annots, show_annots("$query\t$seq_len", $get_annot_sub);
246		}
247
248		for my $seq_annot (@annots) {
249		next unless $seq_annot;
250		my $annot_r = $annot_set{$seq_annot};
251		print ">",$annot_r->{seq_info},"\n";
252		for my $annot (@{$annot_r->{list}}) {
253		if (!$lav && defined($domains{$annot->[-1]})) {
254		my ($a_name, $a_num) = domain_num($annot->[-1],$domains{$annot->[-1]});
255		$annot->[-1] = $a_name;
256		my $tmp_a_num = $a_num;
257		$tmp_a_num =~ s/v$//;
258		if ($acc_comment) {
259		$annot->[-1] .= "{$domain_list[$tmp_a_num]}";
260		}
261		if ($bound_comment) {
262		$annot->[-1] .= $color_sep_str.$annot->[0].":".$annot->[2];
263		}
264		elsif ($show_color) {
265		$annot->[-1] .= $color_sep_str.$a_num;
266		}
267		}
268		print join("\t",@$annot),"\n";
269		}
270		}
271
272		exit(0);
273
274		sub show_annots {
275		my ($query_len, $get_annot_sub) = @_;
276
277		my ($annot_line, $seq_len) = split(/\t/,$query_len);
278
279		my $pfamA_acc;
280
281		$use_acc = 1;
282		$get_annots_sql = $get_pfam_acc;
283
284		my $get_annots_sql_u = $get_upfam_acc;
285
286		if ($annot_line =~ m/^pf\d+\\|/) {
287		($sdb, $gi, $pfamA_acc, $acc, $id) = split(/\\|/,$annot_line);
288		# $dbh->do("use RPD2_pfam");
289		}
290		elsif ($annot_line =~ m/^gi\\|/) {
291		($tmp, $gi, $sdb, $acc, $id) = split(/\\|/,$annot_line);
292		if ($sdb =~ m/ref/) {
293		$get_annots_sql = $get_pfam_refacc;
294		$get_annots_sql_u = $get_upfam_refacc;
295		}
296		}
297		elsif ($annot_line =~ m/^(sp\|tr\|up)\\|/) {
298		($sdb, $acc, $id) = split(/\\|/,$annot_line);
299		}
300		elsif ($annot_line =~ m/^ref\\|/) {
301		($sdb, $acc) = split(/\\|/,$annot_line);
302		$get_annots_sql = $get_pfam_refacc;
303		$get_annots_sql_u = $get_upfam_refacc;
304		}
305		elsif ($annot_line =~ m/^(SP\|TR):/i) {
306		($sdb, $id) = split(/:/,$annot_line);
307		$use_acc = 0;
308		}
309		elsif ($annot_line !~ m/\\|/ && $annot_line !~ m/:/) {
310		$use_acc = 1;
311		($acc) = split(/\s+/,$annot_line);
312		}
313		# deal with no-database SwissProt/NR
314		else {
315		($acc)=($annot_line =~ /^(\S+)/);
316		}
317
318		# here we have an $acc or an $id: check to see if we have the data
319
320		my %annot_data = (seq_info=>$annot_line);
321		my $annot_key = '';
322		unless ($use_acc) {
323		next if ($annot_set{$id});
324		$annot_set{$id} = \%annot_data;
325		$annot_key = $id;
326
327		$get_annots_sql = $get_pfam_id;
328		$get_annots_sql->execute($id);
329		unless ($get_annots_sql->rows()) {
330		$get_annots_sql = $get_annots_sql_u;
331		$get_annots_sql->execute($id);
332		}
333		} else {
334		unless ($acc) {
335		warn "missing acc in $annot_line";
336		return "";
337		}
338		else {
339		$acc =~ s/\.\d+$//;
340
341		$annot_key = $acc;
342		if ($annot_set{$acc}) {
343		goto ret_label;
344		}
345		$annot_set{$acc} = \%annot_data;
346
347		$get_annots_sql->execute($acc);
348		unless ($get_annots_sql->rows()) {
349		$get_annots_sql = $get_annots_sql_u;
350		$get_annots_sql->execute($acc);
351		}
352		}
353		}
354
355		$annot_data{list} = $get_annot_sub->($get_annots_sql, $seq_len);
356
357		ret_label:
358		return $annot_key;
359		}
360
361		sub get_pfam_annots {
362		my ($get_annots, $seq_length) = @_;
363
364		$seq_length = 0 unless $seq_length;
365
366		my @pf_domains = ();
367
368		# get the list of domains, sorted by start
369
370		# $row_href has: seq_start, seq_end, model_start, model_end, model_length,
371		# pfamA_acc, pfamA_id, auto_pfamA_reg_full,
372		# domain_evalue_score as evalue, length
373
374		while ( my $row_href = $get_annots->fetchrow_hashref()) {
375		if ($auto_reg) {
376		$row_href->{info} = $row_href->{auto_pfamA_reg_full};
377		} elsif ($pf_acc) {
378		$row_href->{info} = $row_href->{pfamA_acc};
379		} else {
380		$row_href->{info} = $row_href->{pfamA_id};
381		}
382
383		if ($row_href && $row_href->{length} > $seq_length && $seq_length == 0) {
384		$seq_length = $row_href->{length};
385		}
386
387		next if ($row_href->{seq_start} >= $seq_length);
388		if ($row_href->{seq_end} > $seq_length) {
389		$row_href->{seq_end} = $seq_length;
390		}
391
392		push @pf_domains, $row_href
393		}
394
395		# before checking for domain overlap, check for "split-domains"
396		# (self-unbound) by looking for runs of the same domain that are
397		# ordered by model_start
398
399		if (scalar(@pf_domains) > 1) {
400		my @j_domains; #joined domains
401		my @tmp_domains = @pf_domains;
402
403		my $prev_dom = shift(@tmp_domains);
404
405		for my $curr_dom (@tmp_domains) {
406		# to join domains:
407		# (1) the domains must be in order by model_start/end coordinates
408		# (3) joining the domains cannot make the total combination too long
409
410		# check for model and sequence consistency
411		if (($prev_dom->{pfamA_acc} eq $curr_dom->{pfamA_acc}) # same family
412		&& $prev_dom->{model_start} < $curr_dom->{model_start} # model check
413		&& $prev_dom->{model_end} < $curr_dom->{model_end}
414
415		&& ($curr_dom->{model_start} > $prev_dom->{model_end} * 0.80 # limit overlap
416		\|\| $curr_dom->{model_start} < $prev_dom->{model_end} * 1.25)
417		&& ((($curr_dom->{model_end} - $curr_dom->{model_start}+1)/$curr_dom->{model_length} +
418		($prev_dom->{model_end} - $prev_dom->{model_start}+1)/$prev_dom->{model_length}) < 1.33)
419		) { # join them by updating $prev_dom
420		$prev_dom->{seq_end} = $curr_dom->{seq_end};
421		$prev_dom->{model_end} = $curr_dom->{model_end};
422		$prev_dom->{auto_pfamA_reg_full} = $prev_dom->{auto_pfamA_reg_full} . ";". $curr_dom->{auto_pfamA_reg_full};
423		$prev_dom->{evalue} = ($prev_dom->{evalue} < $curr_dom->{evalue} ? $prev_dom->{evalue} : $curr_dom->{evalue});
424		} else {
425		push @j_domains, $prev_dom;
426		$prev_dom = $curr_dom;
427		}
428		}
429		push @j_domains, $prev_dom;
430		@pf_domains = @j_domains;
431
432
433		if ($no_over) { # for either $no_over or $split_over, check for overlapping domains and edit/split them
434
435		my @tmp_domains = @pf_domains; # allow shifts from copy of @pf_domains
436		my @save_domains = (); # where the new domains go
437
438		my $prev_dom = shift @tmp_domains;
439
440		while (my $curr_dom = shift @tmp_domains) {
441
442		my @overlap_domains = ($prev_dom);
443
444		my $diff = $prev_dom->{seq_end} - $curr_dom->{seq_start};
445
446		my ($prev_len, $cur_len) = ($prev_dom->{seq_end}-$prev_dom->{seq_start}+1,
447		$curr_dom->{seq_end}-$curr_dom->{seq_start}+1);
448
449		my $inclusion = ((($curr_dom->{seq_start} >= $prev_dom->{seq_start}) # start is right && end is left
450		&& ($curr_dom->{seq_end} <= $prev_dom->{seq_end})) \|\| # -- curr inside prev
451		(($curr_dom->{seq_start} <= $prev_dom->{seq_start}) # start is left && end is right
452		&& ($curr_dom->{seq_end} >= $prev_dom->{seq_end}))); # -- prev is inside curr
453
454		my $longer_len = ($prev_len > $cur_len) ? $prev_len : $cur_len;
455
456		# check for overlap > domain_length/$over_fract
457		while ($inclusion \|\| ($diff > 0 && $diff > $longer_len/$over_fract)) {
458		push @overlap_domains, $curr_dom;
459		$curr_dom = shift @tmp_domains;
460		last unless $curr_dom;
461		$diff = $prev_dom->{seq_end} - $curr_dom->{seq_start};
462		($prev_len, $cur_len) = ($prev_dom->{seq_end}-$prev_dom->{seq_start}+1, $curr_dom->{seq_end}-$curr_dom->{seq_start}+1);
463		$longer_len = ($prev_len > $cur_len) ? $prev_len : $cur_len;
464		$inclusion = ((($curr_dom->{seq_start} >= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} <= $prev_dom->{seq_end})) \|\|
465		(($curr_dom->{seq_start} <= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} >= $prev_dom->{seq_end})));
466		}
467
468		# check for overlapping domains; >1 because $prev_dom is always there
469		if (scalar(@overlap_domains) > 1 ) {
470		# if $rpd2_fams, check for a chosen one
471
472		for my $dom ( @overlap_domains) {
473		$dom->{evalue} = 1.0 unless defined($dom->{evalue});
474		}
475
476		@overlap_domains = sort { $a->{evalue} <=> $b->{evalue} } @overlap_domains;
477		$prev_dom = $overlap_domains[0];
478		}
479
480		# $prev_dom should be the best of the overlaps, and we are no longer overlapping > dom_length/3
481		push @save_domains, $prev_dom;
482		$prev_dom = $curr_dom;
483		}
484
485		if ($prev_dom) {
486		push @save_domains, $prev_dom;
487		}
488
489		@pf_domains = @save_domains;
490
491		# now check for smaller overlaps
492		for (my $i=1; $i < scalar(@pf_domains); $i++) {
493		if ($pf_domains[$i-1]->{seq_end} >= $pf_domains[$i]->{seq_start}) {
494		my $overlap = $pf_domains[$i-1]->{seq_end} - $pf_domains[$i]->{seq_start};
495		$pf_domains[$i-1]->{seq_end} -= int($overlap/2);
496		$pf_domains[$i]->{seq_start} = $pf_domains[$i-1]->{seq_end}+1;
497		}
498		}
499		}
500		elsif ($split_over) { # here, everything that overlaps by > $min_vdom should be split into a separate domain
501		my @save_domains = (); # where the new domains go
502
503		# check to see if one domain is included (or overlapping) more
504		# than xx% of the other. If so, pick the longer one
505
506		my ($prev_dom, $curr_dom) = ($pf_domains[0],0) ;
507		for (my $i=1; $i < scalar(@pf_domains); $i++) {
508		$curr_dom = $pf_domains[$i];
509
510		my ($prev_len, $cur_len) = ($prev_dom->{seq_end}-$prev_dom->{seq_start}+1, $curr_dom->{seq_end}-$curr_dom->{seq_start}+1);
511		my $longer_len = ($prev_len > $cur_len) ? $prev_len : $cur_len;
512
513		if (($curr_dom->{seq_start} >= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} <= $prev_dom->{seq_end})
514		&& $cur_len / $prev_len > 0.80) {
515		# $prev_dom stays the same, $curr_dom deleted
516		next;
517		}
518		elsif (($curr_dom->{seq_start} <= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} >= $prev_dom->{seq_end})
519		&& $prev_len / $cur_len > 0.80) {
520		$prev_dom = $curr_dom; # this should delete $prev_dom
521		next;
522		}
523
524		if ($prev_dom->{seq_end} >= $curr_dom->{seq_start} + $min_vdom) {
525		my ($l_seq_end, $r_seq_start) = ($curr_dom->{seq_start}-1, $prev_dom->{seq_end}+1);
526
527		$prev_dom->{seq_end} = $l_seq_end;
528		push @save_domains, $prev_dom;
529		my $new_dom = {seq_start => $l_seq_end+1, seq_end=>$r_seq_start-1,
530		model_length => -1,
531		pfamA_acc=>$prev_dom->{pfamA_acc}."/".$curr_dom->{pfamA_acc},
532		pfamA_id=>$prev_dom->{pfamA_id}."/".$curr_dom->{pfamA_id},
533		};
534
535		if ($pf_acc) {
536		$new_dom->{info} = $new_dom->{pfamA_acc};
537		}
538		else {
539		$new_dom->{info} = $new_dom->{pfamA_id};
540		}
541
542		push @save_domains, $new_dom;
543		$curr_dom->{seq_start} = $r_seq_start;
544		$prev_dom = $curr_dom;
545		}
546		else {
547		push @save_domains, $prev_dom;
548		$prev_dom = $curr_dom;
549		}
550		}
551		push @save_domains, $prev_dom;
552		@pf_domains = @save_domains;
553		}
554		}
555
556		# $vdoms -- virtual Pfam domains -- the equivalent of $neg_doms,
557		# but covering parts of a Pfam model that are not annotated. split
558		# domains have been joined, so simply check beginning and end of
559		# each domain (but must also check for bounded-ness)
560		# only add when 10% or more is missing and missing length > $min_nodom
561
562		if ($vdoms && scalar(@pf_domains)) {
563		my @vpf_domains;
564
565		my $curr_dom = $pf_domains[0];
566		my $length = $curr_dom->{length};
567
568		my $prev_dom={seq_end=>0, pfamA_acc=>''};
569		my $prev_dom_end = 0;
570		my $next_dom_start = $length+1;
571
572		for (my $dom_ix=0; $dom_ix < scalar(@pf_domains); $dom_ix++ ) {
573		$curr_dom = $pf_domains[$dom_ix];
574
575		my $pfamA = $curr_dom->{pfamA_acc};
576
577		# first, look left, is there a domain there (if there is,
578		# it should be updated right
579
580		# my $min_vdom = $curr_dom->{model_length} / 10;
581
582		if ($curr_dom->{model_length} < $min_vdom) {
583		push @vpf_domains, $curr_dom;
584		next;
585		}
586		if ($prev_dom->{pfamA_acc}) { # look for previous domain
587		$prev_dom_end = $prev_dom->{seq_end};
588		}
589
590		# there is a domain to the left, how much room is available?
591		my $left_dom_len = min($curr_dom->{seq_start}-$prev_dom_end-1, $curr_dom->{model_start}-1);
592		if ( $left_dom_len > $min_vdom) {
593		# there is room for a virtual domain
594		my %new_dom = (seq_start=> $curr_dom->{seq_start}-$left_dom_len,
595		seq_end => $curr_dom->{seq_start}-1,
596		info=>'@'.$curr_dom->{info},
597		model_length=>$curr_dom->{model_length},
598		model_end => $curr_dom->{model_start}-1,
599		model_start => $left_dom_len,
600		pfamA_acc=>$pfamA,
601		);
602		push @vpf_domains, \%new_dom;
603		}
604
605		# save the current domain
606		push @vpf_domains, $curr_dom;
607		$prev_dom = $curr_dom;
608
609		if ($dom_ix < $#pf_domains) { # there is a domain to the right
610		# first, give all the extra space to the first domain (no splitting)
611		$next_dom_start = $pf_domains[$dom_ix+1]->{seq_start};
612		}
613		else {
614		$next_dom_start = $length;
615		}
616
617		# is there room for a virtual domain right
618
619		my $right_dom_len = min($next_dom_start-$curr_dom->{seq_end}-1, # space available
620		$curr_dom->{model_length}-$curr_dom->{model_end} # space needed
621		);
622		if ( $right_dom_len > $min_vdom) {
623		my %new_dom = (seq_start=> $curr_dom->{seq_end}+1,
624		seq_end=> $curr_dom->{seq_end}+$right_dom_len,
625		info=>'@'.$curr_dom->{info},
626		model_length => $curr_dom->{model_length},
627		pfamA_acc=> $pfamA,
628		);
629		push @vpf_domains, \%new_dom;
630		$prev_dom = \%new_dom;
631		}
632		} # all done, check for last one
633
634		# $curr_dom=$pf_domains[-1];
635		# # my $min_vdom = $curr_dom->{model_length}/10;
636
637		# my $right_dom_len = min($length - $curr_dom->{seq_end}+1, # space available
638		# $curr_dom->{model_length}-$curr_dom->{model_end} # space needed
639		# );
640		# if ($right_dom_len > $min_vdom) {
641		# my %new_dom = (seq_start=> $curr_dom->{seq_end}+1,
642		# seq_end => $curr_dom->{seq_end}+$right_dom_len,
643		# info=>'@'.$curr_dom->{pfamA_acc},
644		# model_len=> $curr_dom->{model_len},
645		# pfamA_acc => $curr_dom->{pfamA_acc},
646		# model_start => $curr_dom->{model_end}+1,
647		# model_end => $curr_dom->{model_len},
648		# );
649
650		# push @vpf_domains, \%new_dom;
651		# }
652
653		# @vpf_domains has both old @pf_domains and new neg-domains
654		@pf_domains = @vpf_domains;
655		}
656
657		if ($neg_doms) {
658		my @npf_domains;
659		my $prev_dom={seq_end=>0};
660		for my $curr_dom ( @pf_domains) {
661		if ($curr_dom->{seq_start} - $prev_dom->{seq_end} > $min_nodom) {
662		my %new_dom = (seq_start=>$prev_dom->{seq_end}+1, seq_end => $curr_dom->{seq_start}-1, info=>'NODOM');
663		push @npf_domains, \%new_dom;
664		}
665		push @npf_domains, $curr_dom;
666		$prev_dom = $curr_dom;
667		}
668		if ($seq_length - $prev_dom->{seq_end} > $min_nodom) {
669		my %new_dom = (seq_start=>$prev_dom->{seq_end}+1, seq_end=>$seq_length, info=>'NODOM');
670		if ($new_dom{seq_end} > $new_dom{seq_start}) {
671		push @npf_domains, \%new_dom;
672		}
673		}
674
675		# @npf_domains has both old @pf_domains and new neg-domains
676		@pf_domains = @npf_domains;
677		}
678
679		# now make sure we have useful names: colors
680
681		for my $pf (@pf_domains) {
682		$pf->{info} = domain_name($pf->{info}, $pf->{pfamA_acc});
683		}
684
685		my @feats = ();
686		for my $d_ref (@pf_domains) {
687		if ($lav) {
688		push @feats, [$d_ref->{seq_start}, $d_ref->{seq_end}, $d_ref->{info}];
689		} else {
690		push @feats, [$d_ref->{seq_start}, '-', $d_ref->{seq_end}, $d_ref->{info} ];
691		# push @feats, [$d_ref->{seq_end}, ']', '-', ""];
692		}
693
694		}
695
696		return \@feats;
697		}
698
699		sub min {
700		my ($arg1, $arg2) = @_;
701
702		return ($arg1 <= $arg2 ? $arg1 : $arg2);
703		}
704
705		sub max {
706		my ($arg1, $arg2) = @_;
707
708		return ($arg1 >= $arg2 ? $arg1 : $arg2);
709		}
710
711		# domain name takes a uniprot domain label, removes comments ( ;
712		# truncated) and numbers and returns a canonical form. Thus:
713		# Cortactin 6.
714		# Cortactin 7; truncated.
715		# becomes "Cortactin"
716		#
717
718		sub domain_name {
719
720		my ($value, $pfamA_acc) = @_;
721		my $is_virtual = 0;
722
723		if ($value =~ m/^@/) {
724		$is_virtual = 1;
725		$value =~ s/^@//;
726		}
727
728		# check for clan:
729		if ($no_clans) {
730		if (! defined($domains{$value})) {
731		$domain_clan{$value} = 0;
732		$domains{$value} = ++$domain_cnt;
733		push @domain_list, $pfamA_acc;
734		}
735		}
736		elsif (!defined($domain_clan{$value})) {
737		## only do this for new domains, old domains have known mappings
738
739		## ways to highlight the same domain:
740		# (1) for clans, substitute clan name for family name
741		# (2) for clans, use the same color for the same clan, but don't change the name
742		# (3) for clans, combine family name with clan name, but use colors based on clan
743
744		# check to see if it's a clan
745		$get_pfam_clan->execute($pfamA_acc);
746
747		my $pfam_clan_href=0;
748
749		if ($pfam_clan_href=$get_pfam_clan->fetchrow_hashref()) { # is a clan
750		my ($clan_id, $clan_acc) = @{$pfam_clan_href}{qw(clan_id clan_acc)};
751
752		# now check to see if we have seen this clan before (if so, do not increment $domain_cnt)
753		my $c_value = "C." . $clan_id;
754		if ($pf_acc) {$c_value = $clan_acc;}
755
756		$domain_clan{$value} = {clan_id => $clan_id,
757		clan_acc => $clan_acc};
758
759		if ($domains{$c_value}) {
760		$domain_clan{$value}->{domain_cnt} = $domains{$c_value};
761		$value = $c_value;
762		}
763		else {
764		$domain_clan{$value}->{domain_cnt} = ++ $domain_cnt;
765		$value = $c_value;
766		$domains{$value} = $domain_cnt;
767		push @domain_list, $pfamA_acc;
768		}
769		}
770		else { # not a clan
771		$domain_clan{$value} = 0;
772		$domains{$value} = ++$domain_cnt;
773		push @domain_list, $pfamA_acc;
774		}
775		}
776		elsif ($domain_clan{$value} && $domain_clan{$value}->{clan_acc}) {
777		if ($pf_acc) {$value = $domain_clan{$value}->{clan_acc};}
778		else { $value = "C." . $domain_clan{$value}->{clan_id}; }
779		}
780
781		if ($is_virtual) {
782		$domains{'@'.$value} = $domains{$value};
783		$value = '@'.$value;
784		}
785		return $value;
786		}
787
788		sub domain_num {
789		my ($value, $number) = @_;
790		if ($value =~ m/^@/) {
791		$value =~ s/^@/v/;
792		$number = $number."v";
793		}
794		return ($value, $number);
795		}
796
797
798		__END__
799
800		=pod
801
802		=head1 NAME
803
804		ann_pfam30.pl
805
806		=head1 SYNOPSIS
807
808		ann_pfam30.pl --neg-doms --vdoms 'sp\|P09488\|GSTM1_NUMAN' \| accession.file
809
810		=head1 OPTIONS
811
812		-h short help
813		--help include description
814		--no-over : generate non-overlapping domains (equivalent to ann_pfam.pl)
815		--split-over : overlaps of two domains generate a new hybrid domain
816		--no-clans : do not use clans with multiple families from same clan
817		--neg-doms : report domains between annotated domains as NODOM
818		(also --neg, --neg_doms)
819		--vdoms : produce "virtual domains" using model_start,
820		model_end for partial pfam domains
821		--min_nodom=10 : minimum length between domains for NODOM
822
823		--host, --user, --password, --port --db : info for mysql database
824
825		=head1 DESCRIPTION
826
827		C<ann_pfam30.pl> extracts domain information from the pfam msyql
828		database. Currently, the program works with database
829		sequence descriptions in several formats:
830
831		>gi\|1705556\|sp\|P54670.1\|CAF1_DICDI
832		>sp\|P09488\|GSTM1_HUMAN
833		>sp:CALM_HUMAN
834
835		C<ann_pfam30.pl> uses the C<pfamA_reg_full_significant>, C<pfamseq>,
836		and C<pfamA> tables of the C<pfam> database to extract domain
837		information on a protein.
838
839		If the C<--no-over> option is set, overlapping domains are selected and
840		edited to remove overlaps. For proteins with multiple overlapping
841		domains (domains overlap by more than 1/3 of the domain length),
842		C<auto_pfam28.pl> selects the domain annotation with the best
843		C<domain_evalue_score>. When domains overlap by less than 1/3 of the
844		domain length, they are shortened to remove the overlap.
845
846		If the C<--split-over> option is set, if two domains overlap, the
847		overlapping region is split out of the domains and labeled as a new,
848		virtual-lie, domain. If one domain is internal to another and spans
849		80% of the domain, the shorter domain is removed.
850
851		C<ann_pfam30.pl> is designed to be used by the B<FASTA> programs with
852		the C<-V \!ann_pfam30.pl> or C<-V "\!ann_pfam30.pl --neg"> option.
853
854		=head1 AUTHOR
855
856		William R. Pearson, wrp@virginia.edu
857
858		=cut

+2

-1

scripts/ann_pfam30_tmptbl.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014,2015 by William R. Pearson and The Rector &

39	39	# create temporary tables/select permissions for tmp_annot
40	40	#
41	41
	42	use warnings;
42	43	use strict;
43	44
44	45	use DBI;

+878

-0

scripts/ann_pfam_sql.pl less more

	0	#!/usr/bin/env perl
	1
	2	################################################################
	3	# copyright (c) 2015 by William R. Pearson and The Rector &
	4	# Visitors of the University of Virginia */
	5	################################################################
	6	# Licensed under the Apache License, Version 2.0 (the "License");
	7	# you may not use this file except in compliance with the License.
	8	# You may obtain a copy of the License at
	9	#
	10	# http://www.apache.org/licenses/LICENSE-2.0
	11	#
	12	# Unless required by applicable law or agreed to in writing,
	13	# software distributed under this License is distributed on an "AS
	14	# IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either
	15	# express or implied. See the License for the specific language
	16	# governing permissions and limitations under the License.
	17	################################################################
	18
	19	# ann_pfam.pl gets an annotation file from fasta36 -V with a line of the form:
	20
	21	# gi\|62822551\|sp\|P00502\|GSTA1_RAT Glutathione S-transfer\n (at least from pir1.lseg)
	22	#
	23	# it must:
	24	# (1) read in the line
	25	# (2) parse it to get the up_acc
	26	# (3) return the tab delimited features
	27	#
	28
	29	# this is the first version that works with the new Pfam strategy of
	30	# separating Uniprot reference sequences from the rest of uniprot. as
	31	# a result, it is possible that 2 SQL queries will be required, one to
	32	# pfamA_reg_full_significant and a second to uniprot_reg_full.
	33
	34	# modified 15-Jan-2017 to reduce the number of calls when the same
	35	# accession is present multiple times. Accessions are saved in a hash
	36	# than ensures uniqueness. (Could also speed things up by creating temporary table.)
	37	#
	38
	39	use warnings;
	40	use strict;
	41
	42	use DBI;
	43	use Getopt::Long;
	44	use Pod::Usage;
	45
	46	use vars qw($host $db $port $user $pass);
	47
	48	my $hostname = `/bin/hostname`;
	49
	50	($host, $db, $port, $user, $pass) = ("wrpxdb.its.virginia.edu", "pfam32", 0, "web_user", "fasta_www");
	51	#$host = 'xdb';
	52	#$host = 'localhost';
	53	#$db = 'RPD2_pfam28u';
	54
	55	my ($auto_reg,$rpd2_fams, $neg_doms, $vdoms, $lav, $no_doms, $no_clans, $pf_acc, $acc_comment, $bound_comment, $shelp, $help) =
	56	(0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0,);
	57	my ($no_over, $split_over, $over_fract) = (0, 0, 3.0);
	58	my ($clan_fam) = (0);
	59
	60	my ($color_sep_str, $show_color) = (" :",1);
	61	$color_sep_str = '~';
	62
	63	my ($min_nodom, $min_vdom) = (10,10);
	64
	65	GetOptions(
	66	"host=s" => \$host,
	67	"db=s" => \$db,
	68	"user=s" => \$user,
	69	"password=s" => \$pass,
	70	"port=i" => \$port,
	71	"lav" => \$lav,
	72	"acc_comment" => \$acc_comment,
	73	"bound_comment" => \$bound_comment,
	74	"color!" => \$show_color,
	75	"clan_fam\|clan-fam" => \$clan_fam,
	76	"no_over\|no-over" => \$no_over,
	77	"split_over\|split-over=f" => \$split_over,
	78	"over_fract\|over-fract=f" => \$over_fract,
	79	"no-clans\|no_clans" => \$no_clans,
	80	"neg\|neg_doms\|neg-doms" => \$neg_doms,
	81	"min_nodom=i" => \$min_nodom,
	82	"vdoms\|v_doms" => \$vdoms,
	83	"pfacc" => \$pf_acc,
	84	"RPD2" => \$rpd2_fams,
	85	"auto_reg" => \$auto_reg,
	86	"h\|?" => \$shelp,
	87	"help" => \$help,
	88	);
	89
	90	pod2usage(1) if $shelp;
	91	pod2usage(exitstatus => 0, verbose => 2) if $help;
	92	pod2usage(1) unless (@ARGV \|\| -p STDIN \|\| -f STDIN);
	93
	94	my $connect = "dbi:mysql(AutoCommit=>1,RaiseError=>1):database=$db";
	95	$connect .= ";host=$host" if $host;
	96	$connect .= ";port=$port" if $port;
	97
	98	my $dbh = DBI->connect($connect,
	99	$user,
	100	$pass
	101	) or die $DBI::errstr;
	102
	103	my %annot_types = ();
	104	my %domains = (NODOM=>0);
	105	my %domain_clan = (NODOM => {clan_id => 'NODOM', clan_acc=>0, domain_cnt=>0});
	106	my @domain_list = (0);
	107	my $domain_cnt = 0;
	108
	109	my $pfamA_reg_full = 'pfamA_reg_full_significant';
	110	my $uniprot_reg_full = 'uniprot_reg_full';
	111
	112	my $get_annot_sub = \&get_pfam_annots;
	113
	114	my @pfam_fields = qw(seq_start seq_end model_start model_end model_length pfamA_acc pfamA_id auto_pfamA_reg_full domain_evalue_score as evalue length);
	115	my @upfam_fields = qw(seq_start seq_end model_start model_end model_length pfamA_acc pfamA_id auto_uniprot_reg_full domain_evalue_score as evalue length);
	116
	117	my $get_pfam_acc = $dbh->prepare(<<EOSQL);
	118	SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_pfamA_reg_full, domain_evalue_score as evalue, length
	119	FROM pfamseq
	120	JOIN pfamA_reg_full_significant using(pfamseq_acc)
	121	JOIN pfamA USING (pfamA_acc)
	122	WHERE in_full = 1
	123	AND pfamseq_acc=?
	124	ORDER BY seq_start
	125
	126	EOSQL
	127
	128	my $get_upfam_acc = $dbh->prepare(<<EOSQL);
	129	SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_uniprot_reg_full as auto_pfamA_reg_full, domain_evalue_score as evalue, length
	130	FROM uniprot
	131	JOIN uniprot_reg_full using(uniprot_acc)
	132	JOIN pfamA USING (pfamA_acc)
	133	WHERE in_full = 1
	134	AND uniprot_acc=?
	135	ORDER BY seq_start
	136
	137	EOSQL
	138
	139	my $get_pfam_refacc = $dbh->prepare(<<EOSQL);
	140	SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_pfamA_reg_full, domain_evalue_score as evalue, length
	141	FROM $pfamA_reg_full
	142	JOIN pfamseq using(pfamseq_acc)
	143	JOIN pfamA USING (pfamA_acc)
	144	JOIN uniprot.up2ref_acc as up2ref on(up2ref.acc=pfamseq_acc)
	145	WHERE in_full = 1
	146	AND up2ref.ref_acc=?
	147	ORDER BY seq_start
	148
	149	EOSQL
	150
	151	my $get_upfam_refacc = $dbh->prepare(<<EOSQL);
	152	SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_uniprot_reg_full as auto_pfamA_reg_full, domain_evalue_score as evalue, length
	153	FROM uniprot
	154	JOIN uniprot_reg_full using(uniprot_acc)
	155	JOIN pfamA USING (pfamA_acc)
	156	JOIN uniprot.up2ref_acc as up2ref on(up2ref.acc=uniprot_acc)
	157	WHERE in_full = 1
	158	AND ref_acc=?
	159	ORDER BY seq_start
	160
	161	EOSQL
	162
	163	my $get_annots_sql = $get_pfam_acc;
	164
	165	my $get_pfam_id = $dbh->prepare(<<EOSQL);
	166	SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_pfamA_reg_full, domain_evalue_score as evalue, length
	167	FROM pfamseq
	168	JOIN $pfamA_reg_full using(pfamseq_acc)
	169	JOIN pfamA USING (pfamA_acc)
	170	WHERE in_full=1
	171	AND pfamseq_id=?
	172	ORDER BY seq_start
	173
	174	EOSQL
	175
	176	my $get_upfam_id = $dbh->prepare(<<EOSQL);
	177	SELECT seq_start, seq_end, model_start, model_end, model_length, pfamA_acc, pfamA_id, auto_uniprot_reg_full as auto_pfamA_reg_full, domain_evalue_score as evalue, length
	178	FROM uniprot
	179	JOIN uniprot_reg_full using(pfamseq_acc)
	180	JOIN pfamA USING (pfamA_acc)
	181	WHERE in_full=1
	182	AND uniprot_id=?
	183	ORDER BY seq_start
	184
	185	EOSQL
	186
	187	my $get_pfam_clan = $dbh->prepare(<<EOSQL);
	188
	189	SELECT clan_acc, clan_id
	190	FROM clan
	191	JOIN clan_membership using(clan_acc)
	192	WHERE pfamA_acc=?
	193
	194	EOSQL
	195
	196	my $get_rpd2_clans = $dbh->prepare(<<EOSQL);
	197
	198	SELECT auto_pfamA, clan
	199	FROM ljm_db.RPD2_final_fams
	200	WHERE clan is not NULL
	201
	202	EOSQL
	203
	204	# -- LEFT JOIN clan_membership USING (auto_pfamA)
	205	# -- LEFT JOIN clans using(auto_clan)
	206
	207	my ($tmp, $gi, $sdb, $acc, $id, $use_acc);
	208
	209	################
	210	## check for db=*_qfo -- do not use get_upfam_acc in that case
	211	if ($db =~ m/_qfo/) {
	212	$get_upfam_acc= '';
	213	}
	214
	215	# get the query
	216	my ($query, $seq_len) = @ARGV;
	217	$seq_len = 0 unless defined($seq_len);
	218
	219	$query =~ s/^>// if ($query);
	220
	221	my @annots = ();
	222	my %annot_set = ();
	223
	224	my %rpd2_clan_fams = ();
	225
	226	if ($rpd2_fams) {
	227	$get_rpd2_clans->execute();
	228	my ($auto_pfam, $auto_clan);
	229	while (($auto_pfam, $auto_clan)=$get_rpd2_clans->fetchrow_array()) {
	230	$rpd2_clan_fams{$auto_pfam} = $auto_clan;
	231	}
	232	}
	233
	234	#if it's a file I can open, read and parse it
	235	unless ($query && ($query =~ m/[\\|:]/ \|\|
	236	$query =~ m/^[NX]P_/ \|\|
	237	$query =~ m/^[OPQ][0-9][A-Z0-9]{3}[0-9]\|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}\s/)) {
	238
	239	while (my $a_line = <>) {
	240	$a_line =~ s/^>//;
	241	chomp $a_line;
	242	push @annots, show_annots($a_line, $get_annot_sub);
	243	}
	244	}
	245	else {
	246	push @annots, show_annots("$query\t$seq_len", $get_annot_sub);
	247	}
	248
	249	for my $seq_annot (@annots) {
	250	next unless $seq_annot;
	251	my $annot_r = $annot_set{$seq_annot};
	252	print ">",$annot_r->{seq_info},"\n";
	253	for my $annot (@{$annot_r->{list}}) {
	254	if (!$lav && defined($domains{$annot->[-1]})) {
	255	my ($a_name, $a_num) = domain_num($annot->[-1],$domains{$annot->[-1]});
	256	$annot->[-1] = $a_name;
	257	my $tmp_a_num = $a_num;
	258	$tmp_a_num =~ s/v$//;
	259	if ($acc_comment) {
	260	$annot->[-1] .= "{$domain_list[$tmp_a_num]}";
	261	}
	262	if ($bound_comment) {
	263	$annot->[-1] .= $color_sep_str.$annot->[0].":".$annot->[2];
	264	}
	265	elsif ($show_color) {
	266	$annot->[-1] .= $color_sep_str.$a_num;
	267	}
	268	}
	269	print join("\t",@$annot),"\n";
	270	}
	271	}
	272
	273	exit(0);
	274
	275	sub show_annots {
	276	my ($query_len, $get_annot_sub) = @_;
	277
	278	my ($annot_line, $seq_len) = split(/\t/,$query_len);
	279
	280	my $pfamA_acc;
	281
	282	$use_acc = 1;
	283	$get_annots_sql = $get_pfam_acc;
	284
	285	my $get_annots_sql_u = $get_upfam_acc;
	286
	287	if ($annot_line =~ m/^pf\d+\\|/) {
	288	($sdb, $gi, $pfamA_acc, $acc, $id) = split(/\\|/,$annot_line);
	289	# $dbh->do("use RPD2_pfam");
	290	}
	291	elsif ($annot_line =~ m/^gi\\|/) {
	292	($tmp, $gi, $sdb, $acc, $id) = split(/\\|/,$annot_line);
	293	if ($sdb =~ m/ref/) {
	294	$get_annots_sql = $get_pfam_refacc;
	295	$get_annots_sql_u = $get_upfam_refacc;
	296	}
	297	}
	298	elsif ($annot_line =~ m/^(sp\|tr\|up)\\|/) {
	299	($sdb, $acc, $id) = split(/\\|/,$annot_line);
	300	}
	301	elsif ($annot_line =~ m/^ref\\|/) {
	302	($sdb, $acc) = split(/\\|/,$annot_line);
	303	$get_annots_sql = $get_pfam_refacc;
	304	$get_annots_sql_u = $get_upfam_refacc;
	305	}
	306	elsif ($annot_line =~ m/^(SP\|TR):/i) {
	307	($sdb, $id) = split(/:/,$annot_line);
	308	$use_acc = 0;
	309	}
	310	elsif ($annot_line !~ m/\\|/ && $annot_line !~ m/:/) {
	311	$use_acc = 1;
	312	($acc) = split(/\s+/,$annot_line);
	313	}
	314	# deal with no-database SwissProt/NR
	315	else {
	316	($acc)=($annot_line =~ /^(\S+)/);
	317	}
	318
	319	# here we have an $acc or an $id: check to see if we have the data
	320
	321	my %annot_data = (seq_info=>$annot_line, seq_len=>$seq_len);
	322	my $annot_key = '';
	323	unless ($use_acc) {
	324	next if ($annot_set{$id});
	325	$annot_set{$id} = \%annot_data;
	326	$annot_key = $id;
	327
	328	$get_annots_sql = $get_pfam_id;
	329	$get_annots_sql->execute($id);
	330	unless ($get_annots_sql->rows()) {
	331	if ($get_annots_sql_u) {
	332	$get_annots_sql = $get_annots_sql_u;
	333	$get_annots_sql->execute($id);
	334	}
	335	}
	336	} else {
	337	unless ($acc) {
	338	warn "missing acc in $annot_line";
	339	return "";
	340	}
	341	else {
	342	$acc =~ s/\.\d+$//;
	343
	344	$annot_key = $acc;
	345	if ($annot_set{$acc}) {
	346	goto ret_label;
	347	}
	348	$annot_set{$acc} = \%annot_data;
	349
	350	$get_annots_sql->execute($acc);
	351	unless ($get_annots_sql->rows()) {
	352	if ($get_annots_sql_u) {
	353	$get_annots_sql = $get_annots_sql_u;
	354	$get_annots_sql->execute($id);
	355	}
	356	}
	357	}
	358	}
	359
	360	$annot_data{list} = $get_annot_sub->($get_annots_sql, $seq_len);
	361
	362	ret_label:
	363	return $annot_key;
	364	}
	365
	366	sub get_pfam_annots {
	367	my ($get_annots, $seq_length) = @_;
	368
	369	$seq_length = 0 unless $seq_length;
	370
	371	my @pf_domains = ();
	372
	373	# get the list of domains, sorted by start
	374
	375	# $row_href has: seq_start, seq_end, model_start, model_end, model_length,
	376	# pfamA_acc, pfamA_id, auto_pfamA_reg_full,
	377	# domain_evalue_score as evalue, length
	378
	379	while ( my $row_href = $get_annots->fetchrow_hashref()) {
	380	if ($auto_reg) {
	381	$row_href->{info} = $row_href->{auto_pfamA_reg_full};
	382	} elsif ($pf_acc) {
	383	$row_href->{info} = $row_href->{pfamA_acc};
	384	} else {
	385	$row_href->{info} = $row_href->{pfamA_id};
	386	}
	387
	388	if ($row_href && $row_href->{length} > $seq_length && $seq_length == 0) {
	389	$seq_length = $row_href->{length};
	390	}
	391
	392	next if ($row_href->{seq_start} >= $seq_length);
	393	if ($row_href->{seq_end} > $seq_length) {
	394	$row_href->{seq_end} = $seq_length;
	395	}
	396
	397	push @pf_domains, $row_href
	398	}
	399
	400	# before checking for domain overlap, check for "split-domains"
	401	# (self-unbound) by looking for runs of the same domain that are
	402	# ordered by model_start
	403
	404	if (scalar(@pf_domains) > 1) {
	405	my @j_domains; #joined domains
	406	my @tmp_domains = @pf_domains;
	407
	408	my $prev_dom = shift(@tmp_domains);
	409
	410	for my $curr_dom (@tmp_domains) {
	411	# to join domains:
	412	# (1) the domains must be in order by model_start/end coordinates
	413	# (3) joining the domains cannot make the total combination too long
	414
	415	# check for model and sequence consistency
	416	if (($prev_dom->{pfamA_acc} eq $curr_dom->{pfamA_acc}) # same family
	417	&& $prev_dom->{model_start} < $curr_dom->{model_start} # model check
	418	&& $prev_dom->{model_end} < $curr_dom->{model_end}
	419
	420	&& ($curr_dom->{model_start} > $prev_dom->{model_end} * 0.80 # limit overlap
	421	\|\| $curr_dom->{model_start} < $prev_dom->{model_end} * 1.25)
	422	&& ((($curr_dom->{model_end} - $curr_dom->{model_start}+1)/$curr_dom->{model_length} +
	423	($prev_dom->{model_end} - $prev_dom->{model_start}+1)/$prev_dom->{model_length}) < 1.33)
	424	) { # join them by updating $prev_dom
	425	$prev_dom->{seq_end} = $curr_dom->{seq_end};
	426	$prev_dom->{model_end} = $curr_dom->{model_end};
	427	$prev_dom->{auto_pfamA_reg_full} = $prev_dom->{auto_pfamA_reg_full} . ";". $curr_dom->{auto_pfamA_reg_full};
	428	$prev_dom->{evalue} = ($prev_dom->{evalue} < $curr_dom->{evalue} ? $prev_dom->{evalue} : $curr_dom->{evalue});
	429	} else {
	430	push @j_domains, $prev_dom;
	431	$prev_dom = $curr_dom;
	432	}
	433	}
	434	push @j_domains, $prev_dom;
	435	@pf_domains = @j_domains;
	436
	437
	438	if ($no_over) { # for either $no_over or $split_over, check for overlapping domains and edit/split them
	439
	440	my @tmp_domains = @pf_domains; # allow shifts from copy of @pf_domains
	441	my @save_domains = (); # where the new domains go
	442
	443	my $prev_dom = shift @tmp_domains;
	444
	445	while (my $curr_dom = shift @tmp_domains) {
	446
	447	my @overlap_domains = ($prev_dom);
	448
	449	my $diff = $prev_dom->{seq_end} - $curr_dom->{seq_start};
	450
	451	my ($prev_len, $cur_len) = ($prev_dom->{seq_end}-$prev_dom->{seq_start}+1,
	452	$curr_dom->{seq_end}-$curr_dom->{seq_start}+1);
	453
	454	my $inclusion = ((($curr_dom->{seq_start} >= $prev_dom->{seq_start}) # start is right && end is left
	455	&& ($curr_dom->{seq_end} <= $prev_dom->{seq_end})) \|\| # -- curr inside prev
	456	(($curr_dom->{seq_start} <= $prev_dom->{seq_start}) # start is left && end is right
	457	&& ($curr_dom->{seq_end} >= $prev_dom->{seq_end}))); # -- prev is inside curr
	458
	459	my $longer_len = ($prev_len > $cur_len) ? $prev_len : $cur_len;
	460
	461	# check for overlap > domain_length/$over_fract
	462	while ($inclusion \|\| ($diff > 0 && $diff > $longer_len/$over_fract)) {
	463	push @overlap_domains, $curr_dom;
	464	$curr_dom = shift @tmp_domains;
	465	last unless $curr_dom;
	466	$diff = $prev_dom->{seq_end} - $curr_dom->{seq_start};
	467	($prev_len, $cur_len) = ($prev_dom->{seq_end}-$prev_dom->{seq_start}+1, $curr_dom->{seq_end}-$curr_dom->{seq_start}+1);
	468	$longer_len = ($prev_len > $cur_len) ? $prev_len : $cur_len;
	469	$inclusion = ((($curr_dom->{seq_start} >= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} <= $prev_dom->{seq_end})) \|\|
	470	(($curr_dom->{seq_start} <= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} >= $prev_dom->{seq_end})));
	471	}
	472
	473	# check for overlapping domains; >1 because $prev_dom is always there
	474	if (scalar(@overlap_domains) > 1 ) {
	475	# if $rpd2_fams, check for a chosen one
	476
	477	for my $dom ( @overlap_domains) {
	478	$dom->{evalue} = 1.0 unless defined($dom->{evalue});
	479	}
	480
	481	@overlap_domains = sort { $a->{evalue} <=> $b->{evalue} } @overlap_domains;
	482	$prev_dom = $overlap_domains[0];
	483	}
	484
	485	# $prev_dom should be the best of the overlaps, and we are no longer overlapping > dom_length/3
	486	push @save_domains, $prev_dom;
	487	$prev_dom = $curr_dom;
	488	}
	489
	490	if ($prev_dom) {
	491	push @save_domains, $prev_dom;
	492	}
	493
	494	@pf_domains = @save_domains;
	495
	496	# now check for smaller overlaps
	497	for (my $i=1; $i < scalar(@pf_domains); $i++) {
	498	if ($pf_domains[$i-1]->{seq_end} >= $pf_domains[$i]->{seq_start}) {
	499	my $overlap = $pf_domains[$i-1]->{seq_end} - $pf_domains[$i]->{seq_start};
	500	$pf_domains[$i-1]->{seq_end} -= int($overlap/2);
	501	$pf_domains[$i]->{seq_start} = $pf_domains[$i-1]->{seq_end}+1;
	502	}
	503	}
	504	}
	505	elsif ($split_over) { # here, everything that overlaps by > $min_vdom should be split into a separate domain
	506	my @save_domains = (); # where the new domains go
	507
	508	# check to see if one domain is included (or overlapping) more
	509	# than xx% of the other. If so, pick the longer one
	510
	511	my ($prev_dom, $curr_dom) = ($pf_domains[0],0) ;
	512	for (my $i=1; $i < scalar(@pf_domains); $i++) {
	513	$curr_dom = $pf_domains[$i];
	514
	515	my ($prev_len, $cur_len) = ($prev_dom->{seq_end}-$prev_dom->{seq_start}+1, $curr_dom->{seq_end}-$curr_dom->{seq_start}+1);
	516	my $longer_len = ($prev_len > $cur_len) ? $prev_len : $cur_len;
	517
	518	if (($curr_dom->{seq_start} >= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} <= $prev_dom->{seq_end})
	519	&& $cur_len / $prev_len > 0.80) {
	520	# $prev_dom stays the same, $curr_dom deleted
	521	next;
	522	}
	523	elsif (($curr_dom->{seq_start} <= $prev_dom->{seq_start}) && ($curr_dom->{seq_end} >= $prev_dom->{seq_end})
	524	&& $prev_len / $cur_len > 0.80) {
	525	$prev_dom = $curr_dom; # this should delete $prev_dom
	526	next;
	527	}
	528
	529	if ($prev_dom->{seq_end} >= $curr_dom->{seq_start} + $min_vdom) {
	530	my ($l_seq_end, $r_seq_start) = ($curr_dom->{seq_start}-1, $prev_dom->{seq_end}+1);
	531
	532	$prev_dom->{seq_end} = $l_seq_end;
	533	push @save_domains, $prev_dom;
	534	my $new_dom = {seq_start => $l_seq_end+1, seq_end=>$r_seq_start-1,
	535	model_length => -1,
	536	pfamA_acc=>$prev_dom->{pfamA_acc}."/".$curr_dom->{pfamA_acc},
	537	pfamA_id=>$prev_dom->{pfamA_id}."/".$curr_dom->{pfamA_id},
	538	};
	539
	540	if ($pf_acc) {
	541	$new_dom->{info} = $new_dom->{pfamA_acc};
	542	}
	543	else {
	544	$new_dom->{info} = $new_dom->{pfamA_id};
	545	}
	546
	547	push @save_domains, $new_dom;
	548	$curr_dom->{seq_start} = $r_seq_start;
	549	$prev_dom = $curr_dom;
	550	}
	551	else {
	552	push @save_domains, $prev_dom;
	553	$prev_dom = $curr_dom;
	554	}
	555	}
	556	push @save_domains, $prev_dom;
	557	@pf_domains = @save_domains;
	558	}
	559	}
	560
	561	# $vdoms -- virtual Pfam domains -- the equivalent of $neg_doms,
	562	# but covering parts of a Pfam model that are not annotated. split
	563	# domains have been joined, so simply check beginning and end of
	564	# each domain (but must also check for bounded-ness)
	565	# only add when 10% or more is missing and missing length > $min_nodom
	566
	567	if ($vdoms && scalar(@pf_domains)) {
	568	my @vpf_domains;
	569
	570	my $curr_dom = $pf_domains[0];
	571	my $length = $curr_dom->{length};
	572
	573	my $prev_dom={seq_end=>0, pfamA_acc=>''};
	574	my $prev_dom_end = 0;
	575	my $next_dom_start = $length+1;
	576
	577	for (my $dom_ix=0; $dom_ix < scalar(@pf_domains); $dom_ix++ ) {
	578	$curr_dom = $pf_domains[$dom_ix];
	579
	580	my $pfamA = $curr_dom->{pfamA_acc};
	581
	582	# first, look left, is there a domain there (if there is,
	583	# it should be updated right
	584
	585	# my $min_vdom = $curr_dom->{model_length} / 10;
	586
	587	if ($curr_dom->{model_length} < $min_vdom) {
	588	push @vpf_domains, $curr_dom;
	589	next;
	590	}
	591	if ($prev_dom->{pfamA_acc}) { # look for previous domain
	592	$prev_dom_end = $prev_dom->{seq_end};
	593	}
	594
	595	# there is a domain to the left, how much room is available?
	596	my $left_dom_len = min($curr_dom->{seq_start}-$prev_dom_end-1, $curr_dom->{model_start}-1);
	597	if ( $left_dom_len > $min_vdom) {
	598	# there is room for a virtual domain
	599	my %new_dom = (seq_start=> $curr_dom->{seq_start}-$left_dom_len,
	600	seq_end => $curr_dom->{seq_start}-1,
	601	info=>'@'.$curr_dom->{info},
	602	model_length=>$curr_dom->{model_length},
	603	model_end => $curr_dom->{model_start}-1,
	604	model_start => $left_dom_len,
	605	pfamA_acc=>$pfamA,
	606	);
	607	push @vpf_domains, \%new_dom;
	608	}
	609
	610	# save the current domain
	611	push @vpf_domains, $curr_dom;
	612	$prev_dom = $curr_dom;
	613
	614	if ($dom_ix < $#pf_domains) { # there is a domain to the right
	615	# first, give all the extra space to the first domain (no splitting)
	616	$next_dom_start = $pf_domains[$dom_ix+1]->{seq_start};
	617	}
	618	else {
	619	$next_dom_start = $length;
	620	}
	621
	622	# is there room for a virtual domain right
	623
	624	my $right_dom_len = min($next_dom_start-$curr_dom->{seq_end}-1, # space available
	625	$curr_dom->{model_length}-$curr_dom->{model_end} # space needed
	626	);
	627	if ( $right_dom_len > $min_vdom) {
	628	my %new_dom = (seq_start=> $curr_dom->{seq_end}+1,
	629	seq_end=> $curr_dom->{seq_end}+$right_dom_len,
	630	info=>'@'.$curr_dom->{info},
	631	model_length => $curr_dom->{model_length},
	632	pfamA_acc=> $pfamA,
	633	);
	634	push @vpf_domains, \%new_dom;
	635	$prev_dom = \%new_dom;
	636	}
	637	} # all done, check for last one
	638
	639	# $curr_dom=$pf_domains[-1];
	640	# # my $min_vdom = $curr_dom->{model_length}/10;
	641
	642	# my $right_dom_len = min($length - $curr_dom->{seq_end}+1, # space available
	643	# $curr_dom->{model_length}-$curr_dom->{model_end} # space needed
	644	# );
	645	# if ($right_dom_len > $min_vdom) {
	646	# my %new_dom = (seq_start=> $curr_dom->{seq_end}+1,
	647	# seq_end => $curr_dom->{seq_end}+$right_dom_len,
	648	# info=>'@'.$curr_dom->{pfamA_acc},
	649	# model_len=> $curr_dom->{model_len},
	650	# pfamA_acc => $curr_dom->{pfamA_acc},
	651	# model_start => $curr_dom->{model_end}+1,
	652	# model_end => $curr_dom->{model_len},
	653	# );
	654
	655	# push @vpf_domains, \%new_dom;
	656	# }
	657
	658	# @vpf_domains has both old @pf_domains and new neg-domains
	659	@pf_domains = @vpf_domains;
	660	}
	661
	662	if ($neg_doms) {
	663	my @npf_domains;
	664	my $prev_dom={seq_end=>0};
	665	for my $curr_dom ( @pf_domains) {
	666	if ($curr_dom->{seq_start} - $prev_dom->{seq_end} > $min_nodom) {
	667	my %new_dom = (seq_start=>$prev_dom->{seq_end}+1, seq_end => $curr_dom->{seq_start}-1, info=>'NODOM');
	668	push @npf_domains, \%new_dom;
	669	}
	670	push @npf_domains, $curr_dom;
	671	$prev_dom = $curr_dom;
	672	}
	673	if ($seq_length - $prev_dom->{seq_end} > $min_nodom) {
	674	my %new_dom = (seq_start=>$prev_dom->{seq_end}+1, seq_end=>$seq_length, info=>'NODOM');
	675	if ($new_dom{seq_end} > $new_dom{seq_start}) {
	676	push @npf_domains, \%new_dom;
	677	}
	678	}
	679
	680	if (scalar(@pf_domains)==0) {
	681	my %new_dom = (seq_start=>1, seq_end=> $seq_len, info=>'NODOM');
	682	push @pf_domains, \%new_dom;
	683	}
	684
	685	# @npf_domains has both old @pf_domains and new neg-domains
	686	@pf_domains = @npf_domains;
	687	}
	688
	689	# now make sure we have useful names: colors
	690
	691	for my $pf (@pf_domains) {
	692	$pf->{info} = domain_name($pf->{info}, $pf->{pfamA_acc});
	693	}
	694
	695	my @feats = ();
	696	for my $d_ref (@pf_domains) {
	697	if ($lav) {
	698	push @feats, [$d_ref->{seq_start}, $d_ref->{seq_end}, $d_ref->{info}];
	699	} else {
	700	push @feats, [$d_ref->{seq_start}, '-', $d_ref->{seq_end}, $d_ref->{info} ];
	701	# push @feats, [$d_ref->{seq_end}, ']', '-', ""];
	702	}
	703
	704	}
	705
	706	return \@feats;
	707	}
	708
	709	sub min {
	710	my ($arg1, $arg2) = @_;
	711
	712	return ($arg1 <= $arg2 ? $arg1 : $arg2);
	713	}
	714
	715	sub max {
	716	my ($arg1, $arg2) = @_;
	717
	718	return ($arg1 >= $arg2 ? $arg1 : $arg2);
	719	}
	720
	721	# domain name takes a uniprot domain label, removes comments ( ;
	722	# truncated) and numbers and returns a canonical form. Thus:
	723	# Cortactin 6.
	724	# Cortactin 7; truncated.
	725	# becomes "Cortactin"
	726	#
	727
	728	sub domain_name {
	729
	730	my ($value, $pfamA_acc) = @_;
	731	my $is_virtual = 0;
	732
	733	if ($value =~ m/^@/) {
	734	$is_virtual = 1;
	735	$value =~ s/^@//;
	736	}
	737
	738	# check for clan:
	739	if ($no_clans) {
	740	if (! defined($domains{$value})) {
	741	$domain_clan{$value} = 0;
	742	$domains{$value} = ++$domain_cnt;
	743	push @domain_list, $pfamA_acc;
	744	}
	745	}
	746	elsif (!defined($domain_clan{$value})) {
	747	## only do this for new domains, old domains have known mappings
	748
	749	## ways to highlight the same domain:
	750	# (1) for clans, substitute clan name for family name
	751	# (2) for clans, use the same color for the same clan, but don't change the name
	752	# (3) for clans, combine family name with clan name, but use colors based on clan
	753
	754	# check to see if it's a clan
	755	$get_pfam_clan->execute($pfamA_acc);
	756
	757	my $pfam_clan_href=0;
	758
	759	if ($pfam_clan_href=$get_pfam_clan->fetchrow_hashref()) { # is a clan
	760	my ($clan_id, $clan_acc) = @{$pfam_clan_href}{qw(clan_id clan_acc)};
	761
	762	# now check to see if we have seen this clan before (if so, do not increment $domain_cnt)
	763	my $c_value = "C." . $clan_id;
	764
	765	if ($clan_fam) {
	766	$c_value = $c_value;
	767	}
	768
	769	if ($pf_acc) {
	770	$c_value = $clan_acc;
	771	}
	772
	773	$domain_clan{$value} = {clan_id => $clan_id,
	774	clan_acc => $clan_acc};
	775
	776	if ($domains{$c_value}) {
	777	$domain_clan{$value}->{domain_cnt} = $domains{$c_value};
	778	$value = $c_value;
	779	}
	780	else {
	781	$domain_clan{$value}->{domain_cnt} = ++ $domain_cnt;
	782	$value = $c_value;
	783	$domains{$value} = $domain_cnt;
	784	push @domain_list, $pfamA_acc;
	785	}
	786	}
	787	else { # not a clan
	788	$domain_clan{$value} = 0;
	789	$domains{$value} = ++$domain_cnt;
	790	push @domain_list, $pfamA_acc;
	791	}
	792	}
	793	elsif ($domain_clan{$value} && $domain_clan{$value}->{clan_acc}) {
	794	if ($pf_acc) {$value = $domain_clan{$value}->{clan_acc};}
	795	else { $value = "C." . $domain_clan{$value}->{clan_id}; }
	796	}
	797
	798	if ($is_virtual) {
	799	$domains{'@'.$value} = $domains{$value};
	800	$value = '@'.$value;
	801	}
	802
	803	return $value;
	804	}
	805
	806	sub domain_num {
	807	my ($value, $number) = @_;
	808	if ($value =~ m/^@/) {
	809	$value =~ s/^@/v/;
	810	$number = $number."v";
	811	}
	812	return ($value, $number);
	813	}
	814
	815
	816	__END__
	817
	818	=pod
	819
	820	=head1 NAME
	821
	822	ann_pfam_sql.pl
	823
	824	=head1 SYNOPSIS
	825
	826	ann_pfam_sql.pl --neg-doms --vdoms 'sp\|P09488\|GSTM1_NUMAN' \| accession.file
	827
	828	=head1 OPTIONS
	829
	830	-h short help
	831	--help include description
	832	--no-over : generate non-overlapping domains (equivalent to ann_pfam.pl)
	833	--split-over : overlaps of two domains generate a new hybrid domain
	834	--no-clans : do not use clans with multiple families from same clan
	835	--neg-doms : report domains between annotated domains as NODOM
	836	(also --neg, --neg_doms)
	837	--pfacc : report Pfam ACC (PF01234), rather than Pfam identifier (GST-N)
	838	--vdoms : produce "virtual domains" using model_start,
	839	model_end for partial pfam domains
	840	--min_nodom=10 : minimum length between domains for NODOM
	841
	842	--host, --user, --password, --port --db : info for mysql database
	843
	844	=head1 DESCRIPTION
	845
	846	C<ann_pfam_sql.pl> extracts domain information from the pfam msyql
	847	database. Currently, the program works with database
	848	sequence descriptions in several formats:
	849
	850	>gi\|1705556\|sp\|P54670.1\|CAF1_DICDI
	851	>sp\|P09488\|GSTM1_HUMAN
	852	>sp:CALM_HUMAN
	853
	854	C<ann_pfam_sql.pl> uses the C<pfamA_reg_full_significant>, C<pfamseq>,
	855	and C<pfamA> tables of the C<pfam> database to extract domain
	856	information on a protein.
	857
	858	If the C<--no-over> option is set, overlapping domains are selected and
	859	edited to remove overlaps. For proteins with multiple overlapping
	860	domains (domains overlap by more than 1/3 of the domain length),
	861	C<auto_pfam28.pl> selects the domain annotation with the best
	862	C<domain_evalue_score>. When domains overlap by less than 1/3 of the
	863	domain length, they are shortened to remove the overlap.
	864
	865	If the C<--split-over> option is set, if two domains overlap, the
	866	overlapping region is split out of the domains and labeled as a new,
	867	virtual-lie, domain. If one domain is internal to another and spans
	868	80% of the domain, the shorter domain is removed.
	869
	870	C<ann_pfam_sql.pl> is designed to be used by the B<FASTA> programs with
	871	the C<-V \!ann_pfam_sql.pl> or C<-V "\!ann_pfam_sql.pl --neg"> option.
	872
	873	=head1 AUTHOR
	874
	875	William R. Pearson, wrp@virginia.edu
	876
	877	=cut

+3

-2

scripts/ann_pfam_www.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014, 2015 by William R. Pearson and The Rector &

30	30	# >pf26\|164\|O57809\|1A1D_PYRHO
31	31	# and only provides domain information
32	32
	33	use warnings;
33	34	# use strict;
34	35
35	36	use Getopt::Long;

79	80	my @domain_list = (0);
80	81	my $domain_cnt = 0;
81	82
82		my $loc="http://pfam.xfam.org/";
	83	my $loc="https://pfam.xfam.org/";
83	84	my $url;
84	85
85	86	my @pf_domains;

+1

-1

scripts/ann_script_list less more

0		ann_exons_ens.pl
	0	ann_exons_up_sql.pl
1	1	ann_exons_up_www.pl
2	2	ann_feats2ipr.pl
3	3	ann_feats_up_sql.pl

+2

-1

scripts/ann_upfeats_pfam_www_e.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014 by William R. Pearson and The Rector &

28	28
29	29	# this version can read feature2 uniprot features (acc/pos/end/label/value), but returns sorted start/end domains
30	30
	31	use warnings;
31	32	use strict;
32	33
33	34	use Getopt::Long;

+137

-55

scripts/annot_blast_btop2.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3		# copyright (c) 2014,2015 by William R. Pearson and The Rector &
	3	# copyright (c) 2017,2018 by William R. Pearson and The Rector &
4	4	# Visitors of the University of Virginia */
5	5	################################################################
6	6	# Licensed under the Apache License, Version 2.0 (the "License");

17	17	################################################################
18	18
19	19	################################################################
20		# annot_blast_btop2.pl --query query.file --ann_script ann_pfam_www.pl blast_tab_btop_file
	20	# annot_blast_btop2.pl --query query.file --ann_script ann_pfam_www.pl --include_doms blast_tab_btop_file
21	21	################################################################
22	22	# annot_blast_btop2.pl associates domain annotation information and
23	23	# subalignment scores with a blast tabular (-outfmt 6 or -outfmt 7)

29	29	# If the BTOP field or query_file is not available, the script
30	30	# produces domain content without sub-alignment scores.
31	31	################################################################
	32	## 4-Nov-2018
	33	# add --include_doms, which adds a new field with the coordinates of
	34	# the domains in the protein (independent of alignment)
	35	#
	36	################################################################
	37	## 21-July-2018
	38	# include sequence length (actually alignment end) to produce NODOM's (no NODOM's without length).
	39	#
	40	################################################################
32	41	## 13-Jan-2017
33	42	# modified to provide query/subject coordinates and identities if no
34	43	# query sequence -- does not decrement for reverse-complement fastx/blastx DNA

40	49	# add -q_annot_script to annotate query sequence
41	50	#
42	51
	52	use warnings;
43	53	use strict;
44	54	use IPC::Open2;
45	55	use Pod::Usage;
46	56	use Getopt::Long;
	57	use File::Temp qw/ tempfile /;
	58
47	59	# use Data::Dumper;
48	60
49	61	# read lines of the form:

55	67	# and report the domain content ala -m 8CC
56	68
57	69	my ($matrix, $ann_script, $q_ann_script, $show_raw, $shelp, $help) = ("BLOSUM62", "", "", 0, 0, 0);
	70	my ($have_qslen, $dom_info, $sub2query) = (0,0,0); # blast tabular file has sseqid sseqlen qseqid qseqlen
58	71	my ($query_lib_name) = (""); # if $query_lib_name, do not use $query_file_name
59	72	my ($out_field_str) = ("");
60	73	my $query_lib_r = 0;

67	80
68	81	GetOptions(
69	82	"matrix:s" => \$matrix,
70		"ann_script:s" => \$ann_script,
71		"q_ann_script:s" => \$q_ann_script,
	83	"ann_script\|script:s" => \$ann_script,
	84	"q_ann_script\|q_script:s" => \$q_ann_script,
	85	"have_qslen\|have_sqlen!" => \$have_qslen,
	86	"domain_info\|dom_info!" => \$dom_info,
	87	"sub2query!" => \$sub2query,
72	88	"query:s" => \$query_lib_name,
73	89	"query_file:s" => \$query_lib_name,
74	90	"query_lib:s" => \$query_lib_name,
75	91	"out_fields:s" => \$out_field_str,
76		"script:s" => \$ann_script,
77		"q_script:s" => \$q_ann_script,
78	92	"raw_score" => \$show_raw,
79	93	"h\|?" => \$shelp,
80	94	"help" => \$help,

92	106
93	107	my @tab_fields = qw(q_seqid s_seqid percid alen mismatch gopen q_start q_end s_start s_end evalue bits score BTOP);
94	108
	109	if ($have_qslen) {
	110	@tab_fields = qw(q_seqid q_len s_seqid s_len percid alen mismatch gopen q_start q_end s_start s_end evalue bits score BTOP);
	111	}
	112
95	113	# the fields that are displayed are listed here. By default, all fields except score and BTOP are displayed.
96	114	my @out_tab_fields = @tab_fields[0 .. $#tab_fields-1];
	115
97	116	if ($show_raw) {
98	117	push @out_tab_fields, "raw_score";
99
100		}
	118	}
	119
101	120	if ($out_field_str) {
102	121	@out_tab_fields = split(/\s+/,$out_field_str);
103	122	}

134	153	push @hit_list, \%hit_data;
135	154	}
136	155
137		# get the current query sequence
	156	# get the query annotations
	157	if ($q_ann_script) {
	158	$q_ann_script =~ s/\+/ /g;
	159	}
	160
138	161	if ($q_ann_script && -x (split(/\s+/,$q_ann_script))[0]) {
139	162	# get the domains for the q_seqid using --q_ann_script
140	163	#

142	165	my $pid = open2($Reader, $Writer, $q_ann_script);
143	166	my $hit = $hit_list[0];
144	167
145		print $Writer $hit->{q_seqid},"\n";
	168	my $q_seq_len = scalar(@{$query_lib_r->{$hit->{q_seqid}}});
	169	print $Writer $hit->{q_seqid},"\t",$q_seq_len,"\n";
146	170	close($Writer);
147	171
148		@q_hit_list = ({ s_seq_id=> $hit->{q_seqid} });
	172	push @q_hit_list,{ s_seq_id=> $hit->{q_seqid}, s_end=> $q_seq_len};
149	173
150	174	read_annots($Reader, \@q_hit_list, 0);
151	175
152	176	waitpid($pid, 0);
153	177	}
154	178
155		# get the current query sequence
	179	# get the subject annotations
	180	if ($ann_script) {
	181	$ann_script =~ s/\+/ /g;
	182	}
	183
156	184	if ($ann_script && -x (split(/\s+/,$ann_script))[0]) {
157	185	# get the domains for each s_seqid using --ann_script
158	186	#
	187	# this does not work currently because only one accession is sent.
	188	# For mulitple hits, I need to make a tmp_file.
	189
159	190	my ($Reader, $Writer);
160	191	my $pid = open2($Reader, $Writer, $ann_script);
	192
161	193	for my $hit (@hit_list) {
162		print $Writer $hit->{s_seqid},"\n";
	194	# print STDERR $hit->{s_seqid},"\t", $hit->{s_end},"\n";
	195	# print $Writer $hit->{s_seqid},"\t", $hit->{s_end},"\n";
	196	my $s_len = 100000;
	197	if ($have_qslen) {
	198	$s_len = $hit->{s_len};
	199	}
	200	print $Writer $hit->{s_seqid},"\t", $s_len,"\n";
163	201	}
164	202	close($Writer);
165	203

174	212	@header_lines = ($next_line);
175	213
176	214	# now get query sequence if available
	215
	216	if ($sub2query && scalar(@q_hit_list)==0) {
	217	# copy the information from $hit_list
	218	for my $tmp_hit ( @hit_list ) {
	219	if ($tmp_hit->{q_seqid} eq $tmp_hit->{s_seqid}) {
	220	my %tmp_q_hit = (s_seq_id=> $tmp_hit->{q_seqid}, s_end=> $tmp_hit->{s_len});
	221
	222	$tmp_q_hit{'domains'} = [];
	223	for my $dom ( @{$tmp_hit->{domains}} ) {
	224	my %new_dom = map { $_ => $dom->{$_} } keys(%$dom);
	225	$new_dom{target} = 0;
	226	push @{$tmp_q_hit{'domains'}}, \%new_dom;
	227	}
	228
	229	$tmp_q_hit{'sites'} = [];
	230	for my $site ( @{$tmp_hit->{sites}} ) {
	231	my %new_site = map { $_ => $site->{$_} } keys(%$site);
	232	$new_site{target} = 0;
	233	push @{$tmp_q_hit{'sites'}}, \%new_site;
	234	}
	235	push @q_hit_list,\%tmp_q_hit;
	236	last;
	237	}
	238	}
	239	}
177	240
178	241	my $q_hit = $q_hit_list[0];
179	242

237	300
238	301	if (scalar(@$merged_annots_r)) { # show subalignment scores if available
239	302	print "\t";
240
241	303	print format_annot_info($hit, $merged_annots_r);
	304	if ($dom_info) {
	305	print "\t",format_dom_info($q_hit->{domains}, $hit->{domains});
	306	}
242	307	}
243	308	elsif (@list_covered) { # otherwise show domain content
244	309	print "\t",join(";",@list_covered);
245		}
	310	if ($dom_info) {
	311	print "\t",format_dom_info($q_hit->{domains}, $hit->{domains});
	312	}
	313	}
	314
246	315	print "\n";
247	316	}
248	317

275	344	while (my $line = <$Reader>) {
276	345	next if $line=~ m/^=/;
277	346	chomp $line;
	347
	348	# print STDERR "$line\n";
278	349
279	350	# check for header
280	351	if ($line =~ m/^>/) {

289	360	}
290	361	@hit_domains = (); # current domains
291	362	@hit_sites = (); # current sites
292		$current_domain = $line;
	363	$current_domain = (split(/\s+/,$line))[0];
293	364	$current_domain =~ s/^>//;
294	365	} else { # check for data
295	366	my %annot_info = (target=>$target);

308	379	}
309	380	close($Reader);
310	381
311		# all done, save the last one
312	382	$hit_list_r->[$hit_ix]{domains} = \@hit_domains;
313	383	$hit_list_r->[$hit_ix]{sites} = \@hit_sites;
	384
	385	# clean up NODOMs in {domains}
	386	for my $hit ( @$hit_list_r ) {
	387	# clean-up last NODOM if < 10
	388	my $tmp_domains = $hit->{domains};
	389	next unless (scalar(@{$tmp_domains}));
	390	my ($last_dom, $left_coord) = ($tmp_domains->[-1], $hit->{s_end});
	391	if ($last_dom->{descr} =~ m/^NODOM/ && (($left_coord - $last_dom->{d_pos} + 1) < 10)) {
	392	pop @$tmp_domains;
	393	}
	394	}
314	395	}
315	396
316	397	# input: a blast BTOP string of the form: "1VA160TS7KG10RK27"

416	497	$blosum62[22] = [ qw( 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 -1 -1 -4) ];
417	498	$blosum62[23] = [ qw( -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1) ];
418	499
419
420	500	die "blosum62 length mismatch $#blosum62 != $#ncbi_blaa" if (scalar(@blosum62) != scalar(@ncbi_blaa));
421	501
422	502	for (my $i=0; $i < scalar(@ncbi_blaa); $i++) {

499	579	my @aligned_domains = ();
500	580
501	581	my $left_active_end = $domain_r->[-1]->{d_end}+1; # as far right as possible
	582	my $left_align_end = $hit_r->{q_end};
	583	if ($target) {
	584	$left_align_end = $hit_r->{s_end};
	585	}
	586
	587	if ($left_active_end > $left_align_end ) {
	588	$left_active_end = $left_align_end ;
	589	}
	590
502	591	my ($q_start, $s_start, $h_start, $h_end) = @{$hit_r}{qw(q_start s_start s_start s_end)};
503		my ($qix, $six) = ($q_start, $s_start); # $qix now starts from 1, like $ssix;
	592	my ($qix, $six) = ($q_start, $s_start); # $qix now starts from 1, like $six;
504	593
505	594	my $ds_ix = \$six; # use to track the subject position
506	595	# reverse coordinate names if $target==0

1137	1226	return \@merged_array;
1138	1227	}
1139	1228
1140		# domain output formatter
	1229	####
	1230	# print raw domain info:
	1231	# \|DX:%d-%d;C=dom_info\|XD:%d-%d:C=dom_info
	1232	#
1141	1233	sub format_dom_info {
1142		my ($hit_r, $raw_score, $dom_r) = @_;
1143
1144		unless ($raw_score) {
1145		warn "no raw_score at: ".$hit_r->{s_seqid}."\n";
1146		$raw_score = $hit_r->{score};
1147		}
1148
1149		my ($score_scale, $fsub_score) = ($hit_r->{score}/$raw_score, $dom_r->{score}/$raw_score);
1150
1151		my $qval = 0.0;
1152		if ($hit_r->{evalue} == 0.0) {
1153		$qval = 3000.0
1154		}
1155		else {
1156		$qval = -10.0log($hit_r->{evalue})$fsub_score/(log(10.0))
1157		}
1158
1159		my ($ns_score, $s_bit) = (int($dom_r->{score} * $score_scale+0.5),
1160		int($hit_r->{bits} * $fsub_score +0.5),
1161		);
1162		$qval = 0 if $qval < 0;
1163
1164		# print join(":",($dom_r->{ad_pos},$dom_r->{ad_end},$ns_score, $s_bit, sprintf("%.1f",$qval))),"\n";
1165		return join(";",(sprintf("\|XR:%d-%d:%d-%d:s=%d",
1166		$dom_r->{qa_start},$dom_r->{qa_end},
1167		$dom_r->{sa_start},$dom_r->{sa_end},$ns_score),
1168		sprintf("b=%.1f",$s_bit),
1169		sprintf("I=%.3f",$dom_r->{percid}),
1170		sprintf("Q=%.1f",$qval),$dom_r->{descr}));
	1234	my ($q_dom_r, $dom_r) = @_;
	1235
	1236	my $dom_str = "";
	1237	for my $dom ( @$q_dom_r ) {
	1238	$dom_str .= sprintf("\|DX:%d-%d;C=%s",@{$dom}{qw(d_pos d_end descr)});
	1239	}
	1240	for my $dom ( @$dom_r ) {
	1241	$dom_str .= sprintf("\|XD:%d-%d;C=%s",@{$dom}{qw(d_pos d_end descr)});
	1242	}
	1243
	1244	return $dom_str;
1171	1245	}
1172	1246
1173	1247	# merged annot output formatter

1195	1269	if ($annot_r->{type} eq '-') { # domain with scores
1196	1270	my $fsub_score = $annot_r->{score}/$raw_score;
1197	1271
	1272	my ($ns_score, $s_bit) = (int($annot_r->{score} * $score_scale + 0.5),
	1273	int($hit_r->{bits} * $fsub_score + 0.5),
	1274	);
1198	1275	my $qval = 0.0;
1199	1276	if ($hit_r->{evalue} == 0.0) {
1200		$qval = 3000.0
	1277	if ($s_bit > 50) {
	1278	$qval = 3000.0
	1279	}
	1280	else {
	1281	$qval = -10.0 * (log(400.0 * 400.) + $s_bit)/log(10.0);
	1282	}
1201	1283	} else {
1202	1284	$qval = -10.0log($hit_r->{evalue})$fsub_score/(log(10.0))
1203	1285	}
1204	1286
1205		my ($ns_score, $s_bit) = (int($annot_r->{score} * $score_scale+0.5),
1206		int($hit_r->{bits} * $fsub_score +0.5),
1207		);
1208	1287	$qval = 0 if $qval < 0;
1209	1288
1210	1289	$annot_str .= join(";",(sprintf("\|%s:%d-%d:%d-%d:s=%d",

1213	1292	$annot_r->{sa_start},$annot_r->{sa_end},$ns_score),
1214	1293	sprintf("b=%.1f",$s_bit),
1215	1294	sprintf("I=%.3f",$annot_r->{percid}),
1216		sprintf("Q=%.1f",$qval),$annot_r->{descr}));
	1295	sprintf("Q=%.1f",$qval),"C=".$annot_r->{descr}));
1217	1296	}
1218	1297	else { # site annotation
1219	1298	my $ann_type = $annot_r->{type};

1252	1331
1253	1332	--ann_script -- annotation script returning site/domain locations for subject sequences
1254	1333	-- same as --script
	1334
	1335	--have_qslen -- use a blast tabular format that includes the query and subject sequence lengths:
	1336	-- q_seqid q_len s_seqid s_len ...
1255	1337
1256	1338	--q_ann_script -- annotation script for query sequences
1257	1339	-- same as --q_script

+68

-0

scripts/blastp_annot_cmd.sh less more

	0	#!/bin/bash
	1
	2	cmd="";
	3	for i in "$@"
	4	do
	5	case $i in
	6	-o=\|--outname=)
	7	OUTNAME="${i#*=}"
	8	shift # past argument=value
	9	;;
	10	-q=\|--query=)
	11	QUERY="${i#*=}"
	12	cmd="$cmd -query $QUERY"
	13	shift # past argument=value
	14	;;
	15	--ann_script=*)
	16	ANN_SCRIPT="${i#*=}"
	17	shift
	18	;;
	19	--q_ann_script=*)
	20	Q_ANN_SCRIPT="${i#*=}"
	21	shift
	22	;;
	23	*)
	24	cmd="$cmd $i"
	25	;;
	26	esac
	27	done
	28
	29	# echo "OUTNAME: " $OUTNAME
	30	# echo "CMD: " $cmd
	31
	32	if [[ $OUTNAME == '' ]]; then
	33	OUTNAME=${QUERY}_out
	34	fi
	35
	36	#if [[ $ANN_SCRIPT == '' ]]; then
	37	# ANN_SCRIPT="/seqprg/bin/ann_pfam30.pl --db=pfam31_qfo --host=localhost --neg --vdoms --acc_comment"
	38	#fi
	39
	40
	41	# echo "OUTNAME2: " $OUTNAME
	42
	43	bl_asn="$OUTNAME.asn"
	44	bl0_out="$OUTNAME.html"
	45	bla_out="${OUTNAME}_an.html"
	46	blm_out="$OUTNAME.msa"
	47	blt_out="$OUTNAME.bl_tab"
	48	blt_ann="$OUTNAME.bl_tab_ann"
	49	blr_out="$OUTNAME.bl_tab_rn"
	50
	51	# echo "tmp_files:"
	52	# echo $bl_asn $bl0_out $bla_out $blt_out
	53
	54	# echo "OUTFILE = ${OUTNAME}"
	55
	56	#export BLAST_PATH="/ebi/extserv/bin/ncbi-blast+/bin"
	57	export BLAST_PATH="/seqprg/bin"
	58
	59	$BLAST_PATH/blastp -outfmt 11 $cmd > $bl_asn
	60	$BLAST_PATH/blast_formatter -archive $bl_asn -outfmt 0 -html > $bl0_out
	61	$BLAST_PATH/blast_formatter -archive $bl_asn -outfmt '7 qseqid qlen sseqid slen pident length mismatch gapopen qstart qend sstart send evalue bitscore score btop' > $blt_out
	62	annot_blast_btop2.pl --query $QUERY --have_qslen --dom_info --ann_script "$ANN_SCRIPT" --q_ann_script "$Q_ANN_SCRIPT" $blt_out > $blt_ann
	63
	64	rename_exons.py --have_qslen --dom_info $blt_ann > $blr_out
	65	merge_blast_btab.pl --plot_url="plot_domain6t.cgi" --have_qslen --dom_info --btab $blr_out $bl0_out
	66
	67	# $BLAST_PATH/blast_formatter -archive $bl_asn -outfmt 2 > $blm_out

+1

-1

scripts/blastp_cmd.sh less more

25	25
26	26	$BLAST_PATH/blastp -outfmt 11 $cmd > $bl_asn
27	27	$BLAST_PATH/blast_formatter -archive $bl_asn -outfmt 0 -html > $bl0_out
28		$BLAST_PATH/blast_formatter -archive $bl_asn -outfmt '7 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore score btop' > $blt_out
	28	$BLAST_PATH/blast_formatter -archive $bl_asn -outfmt '7 qseqid qlen sseqid slen pident length mismatch gapopen qstart qend sstart send evalue bitscore score btop' > $blt_out
29	29	$BLAST_PATH/blast_formatter -archive $bl_asn -outfmt 2 > $blm_out
30	30

+2

-1

scripts/exp_up_ensg.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2010, 2014 by William R. Pearson and The Rector &

34	34	## sequences from an NCBI blast-formatted database.
35	35	##
36	36
	37	use warnings;
37	38	use strict;
38	39	use DBI;
39	40

+2

-1

scripts/expand_links.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2010, 2014 by William R. Pearson and The Rector &

34	34	## sequences from an NCBI blast-formatted database.
35	35	##
36	36
	37	use warnings;
37	38	use strict;
38	39	use DBI;
39	40

+2

-1

scripts/expand_uniref50.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2010, 2014 by William R. Pearson and The Rector &

24	24	# (2) take the uniprot accessions and produce a fasta library file
25	25	# from them
26	26
	27	use warnings;
27	28	use strict;
28	29	use DBI;
29	30

+205

-0

scripts/expand_up_isoforms.pl less more

	0	#!/usr/bin/env perl
	1
	2	################################################################
	3	# copyright (c) 2010, 2014 by William R. Pearson and The Rector &
	4	# Visitors of the University of Virginia */
	5	################################################################
	6	# Licensed under the Apache License, Version 2.0 (the "License");
	7	# you may not use this file except in compliance with the License.
	8	# You may obtain a copy of the License at
	9	#
	10	# http://www.apache.org/licenses/LICENSE-2.0
	11	#
	12	# Unless required by applicable law or agreed to in writing,
	13	# software distributed under this License is distributed on an "AS
	14	# IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either
	15	# express or implied. See the License for the specific language
	16	# governing permissions and limitations under the License.
	17	################################################################
	18
	19	## usage - expand_up_isoforms.pl [--prim_acc] up_hits.file > up_isoforms.file
	20	##
	21	## take a fasta36 -e expand.sh result file of the form:
	22	## sp\|P09488_GSTM1_HUMAN\|<tab>1.1e-50
	23	##
	24	## and extract the accession number, looking it up from the an SQL
	25	## table $table -- in this case "annot2_iso" to provide Uniprot
	26	## isoforms based on a uniprot accession.
	27	##
	28	## if --prim_acc, then the primary accession (used to find the isoforms) is added to the isoform seq_id, e.g.
	29	## sp\|P04988\|GSTM1_HUMAN has isoforms: with --prim_acc, the identifiers become
	30	## >iso\|E7EWW9\|E7EWW9_HUMAN >iso\|E7EWW9\|E7EWW9_HUMAN_P09488
	31	## >iso\|H3BRM6\|H3BRM6_HUMAN >iso\|H3BRM6\|H3BRM6_HUMAN_P09488
	32	## >iso\|H3BQT3\|H3BQT3_HUMAN >iso\|H3BQT3\|H3BQT3_HUMAN_P09488
	33
	34	use warnings;
	35	use strict;
	36	use Getopt::Long;
	37	use Pod::Usage;
	38	use DBI;
	39
	40	my ($host, $db, $port, $user, $pass) = ("xdb", "uniprot", 0, "web_user", "fasta_www");
	41	$host = 'wrpxdb.its.virginia.edu';
	42	my ($a_table, $i_table) = ("annot2", "annot2_iso");
	43	my ($help, $shelp) = (0,0);
	44	my ($e_thresh, $prim_acc) = (1e-6, 0);
	45
	46	GetOptions(
	47	"h" => \$shelp,
	48	"help" => \$help,
	49	"host=s" => \$host,
	50	"prim_acc!" => \$prim_acc,
	51	"db=s" => \$db,
	52	"expect\|evalue\|e_thresh=f" => \$e_thresh,
	53	"user=s" => \$user,
	54	"password=s" => \$pass,
	55	"port=i" => \$port,
	56	"i_table" => \$i_table,
	57	"a_table" => \$a_table,
	58	);
	59
	60	pod2usage(1) if $shelp;
	61	pod2usage(exitstatus => 0, verbose => 2) if $help;
	62	pod2usage(1) unless (@ARGV \|\| -p STDIN \|\| -f STDIN);
	63
	64	my $dbh = DBI->connect("dbi:mysql:host=$host:$db",
	65	$user, $pass,
	66	{ RaiseError => 1, AutoCommit => 1}
	67	) or die $DBI::errstr;
	68
	69	my %sth = (
	70	seed2link_acc => "SELECT acc FROM $i_table WHERE prim_acc=?",
	71	seed2link_id => "SELECT iso_a.acc FROM $i_table as iso_a JOIN $a_table as an2 on(iso_a.prim_acc=an2.acc) where an2.id=?",
	72	link2seq => "SELECT db, acc, prim_acc, id, descr, seq FROM annot2_iso JOIN protein_iso USING(acc) WHERE acc=?"
	73	);
	74
	75	for my $sth (keys(%sth)) {
	76	$sth{$sth} = $dbh->prepare($sth{$sth});
	77	}
	78
	79	my %acc_uniq = ();
	80
	81	# get the query
	82	my ($query, $eval_arg) = @ARGV;
	83	$eval_arg = 1e-10 unless $eval_arg;
	84	$query =~ s/^>// if ($query);
	85	my @link_lines = ();
	86
	87	#if it's a file I can open, read and parse it
	88	unless ($query && ($query =~ m/[\\|:]/ \|\|
	89	$query =~ m/^[NX]P_/ \|\|
	90	$query =~ m/^[OPQ][0-9][A-Z0-9]{3}[0-9]\|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}\s/)) {
	91
	92	while (my $a_line = <>) {
	93	$a_line =~ s/^>//;
	94	chomp $a_line;
	95	push @link_lines, $a_line;
	96	}
	97	}
	98	else {
	99	push @link_lines, "$query\t$eval_arg";
	100	}
	101
	102	for my $line ( @link_lines ) {
	103	my ($hit, $e_val) = split(/\t/,$line);
	104
	105	if ($e_val <= $e_thresh) {
	106	process_line($hit,$sth{seed2link_acc},$sth{seed2link_id});
	107	}
	108	}
	109
	110	for my $acc ( keys %acc_uniq ) {
	111
	112	$sth{link2seq}->execute($acc);
	113	while (my $row_href = $sth{link2seq}->fetchrow_hashref ) {
	114	my $id_str = $row_href->{id};
	115	if ($prim_acc) {
	116	$id_str .= "_".$row_href->{prim_acc};
	117	}
	118
	119	printf(">%s\|%s\|%s %s\n","iso",$acc,$id_str,$row_href->{descr});
	120	my $iso_seq = $row_href->{seq};
	121	$iso_seq =~ s/(.{60})/$1\n/g;
	122
	123	print "$iso_seq\n";
	124	}
	125	$sth{link2seq}->finish();
	126	}
	127
	128	$dbh->disconnect();
	129
	130	sub process_line{
	131	my ($seqid,$sth_acc, $sth_id)=@_;
	132
	133	my $sth = $sth_acc;
	134
	135	my ($db, $link_acc, $link_id) = ("","","");
	136
	137	if ($seqid =~ m/\\|/) {
	138	($db, $link_acc, $link_id) = split('\\|',$seqid);
	139	$link_acc =~ s/\.\d+$//;
	140
	141	$sth_acc->execute($link_acc);
	142	}
	143	elsif ($seqid =~ m/:/) {
	144	($db, $link_id) = split(':',$seqid);
	145	$sth_id->execute($link_id);
	146	$sth = $sth_id;
	147	}
	148	else {
	149	$link_acc = $seqid;
	150	$link_acc =~ s/\.\d+$//;
	151	$sth_acc->execute($link_acc);
	152	}
	153
	154	while (my ($acc) = $sth->fetchrow_array()) {
	155	next if ($acc eq $link_acc);
	156	$acc_uniq{$acc} = $link_acc unless $acc_uniq{$acc};
	157	}
	158	$sth->finish();
	159	}
	160
	161	__END__
	162
	163	=pod
	164
	165	=head1 NAME
	166
	167	expand_up_isoforms.pl expand_file.tab
	168
	169	=head1 SYNOPSIS
	170
	171	expand_up_isoforms.pl expand_file.tab
	172
	173	=head1 OPTIONS
	174
	175	-h short help
	176	--help include description
	177	--evalue E()-value threshold for expansion
	178	--prim_acc : show primary accession as part of sequence identifier
	179	>iso\|E7EWW9\|E7EWW9_HUMAN becomes >iso\|E7EWW9\|E7EWW9_HUMAN_P09488
	180
	181	--host, --user, --password, --port --db : info for mysql database
	182	--a_table, --i_table -- SQL table names with reference and isoform acc/id/prim_acc mappings.
	183
	184	=head1 DESCRIPTION
	185
	186	C<expand_up_isoforms.pl> uses protein isoform tables in an SQL database to identify and extract
	187	isoforms of proteins in a reference protein sequence database.
	188
	189	C<expand_up_isoforms.pl> takes a file with sequece identifiers and E()-values of the form:
	190
	191	sp\|P09488\|GSTM1_HUMAN <tab> 1e-40
	192	sp:CALM_HUMAN <tab> 1e-40
	193
	194	Lines with E()-values less than --evalue (1E-6 by default) are used to
	195	identify protein isoforms, which are included in the set of sequences to be aligned.
	196
	197	C<expand_up_isoforms.pl> is designed to be used by the B<FASTA> programs with
	198	the C<-e expand_up_isoforms.pl> option.
	199
	200	=head1 AUTHOR
	201
	202	William R. Pearson, wrp@virginia.edu
	203
	204	=cut

+84

-0

scripts/fasta_annot_cmd.sh less more

	0	#!/bin/bash
	1
	2	cmd="";
	3	for i in "$@"
	4	do
	5	case $i in
	6	--outname=*)
	7	OUTNAME="${i#*=}"
	8	shift # past argument=value
	9	;;
	10	--query=*)
	11	QUERY="${i#*=}"
	12	shift # past argument=value
	13	;;
	14	--db=*)
	15	DATABASE="${i#*=}"
	16	shift # past argument=value
	17	;;
	18	--cmd=*)
	19	SRCH_CMD="${i#*=}"
	20	shift
	21	;;
	22	--ktup=*)
	23	KTUP="${i#*=}"
	24	shift
	25	;;
	26	*)
	27	cmd="$cmd $i"
	28	;;
	29	esac
	30	done
	31
	32
	33	# echo "OUTNAME: " $OUTNAME
	34	echo "# CMD: " $cmd
	35
	36	if [[ $OUTNAME == '' ]]; then
	37	OUTNAME=${QUERY}_out
	38	fi
	39
	40	if [[ $SRCH_CMD == '' ]]; then
	41	SRCH_CMD=fasta36
	42	fi
	43
	44	#if [[ $ANN_SCRIPT == '' ]]; then
	45	# ANN_SCRIPT="/seqprg/bin/ann_pfam30.pl --db=pfam31_qfo --host=localhost --neg --vdoms --acc_comment"
	46	#fi
	47
	48
	49	# echo "OUTNAME: " $OUTNAME
	50
	51	bl0_out="$OUTNAME.html"
	52	bla_out="${OUTNAME}_an.html"
	53	blt_out="$OUTNAME.fa_tab"
	54	blr_out="$OUTNAME.fa_tab_rn"
	55
	56	export BLAST_PATH="/seqprg/bin"
	57	# BLAST_PATH="../bin"
	58
	59	cmd="$cmd -mF8CBL=$blt_out $QUERY $DATABASE"
	60
	61	# echo "tmp_files:"
	62	# echo $bl_asn $bl0_out $bla_out $blt_out
	63	# echo "OUTFILE = ${OUTNAME}"
	64
	65	#echo "cmd: $cmd"
	66	#echo "==="
	67	#echo "bl0_out: $bl0_out"
	68	#echo "==="
	69
	70	# echo "$BLAST_PATH/$SRCH_CMD $cmd > $bl0_out"
	71
	72	# run the program
	73	$BLAST_PATH/$SRCH_CMD $cmd > $bl0_out
	74
	75	$BLAST_PATH/rename_exons.py --have_qslen --dom_info $blt_out > $blr_out
	76
	77	if [ ! -s $blr_out ]; then
	78	# echo "# " `ls -l $blt_out $blr_out`
	79	blr_out=$blt_out
	80	# echo "# " `ls -l $blt_out $blr_out`
	81	fi
	82
	83	$BLAST_PATH/merge_fasta_btab.pl --plot_url="plot_domain6t.cgi" --have_qslen --dom_info --btab $blr_out $bl0_out

+53

-0

scripts/get_genome_seq.py less more

	0	#!/usr/bin/python
	1
	2	################
	3	## get_hg38_bed.py parses an HG38 coordinate into a pseudo-bed entry,
	4	## and runs bedtools getfasta to return the fasta sequence
	5	##
	6
	7	import sys
	8	import re
	9	from subprocess import Popen, PIPE, STDOUT
	10	import shlex
	11	import argparse
	12
	13	## a genome_loc should look like: chr#:start-stop
	14	## if stop < start, coordinates are reversed
	15
	16	genome_dict={'hg38':'genome_dna/hg38/reference.fa',
	17	'mm10':'genome_dna/mm10/reference.fa',
	18	'rn6':'genome_dna/rn6/rn6.fa'}
	19
	20	parser=argparse.ArgumentParser(description='get_genome_seq.py : get fasta sequence from genome coordinates ')
	21	parser.add_argument('--genome', help='genome: hg38 \| mm10 \| rn6',dest='genome',action='store',default='hg38')
	22	parser.add_argument('coords', help='genome coordinates chr1:12345-54321', nargs='*')
	23
	24	args=parser.parse_args()
	25
	26	bed_cmd = 'bedtools getfasta -fi $RDLIB2/%s -bed stdin' % (genome_dict[args.genome])
	27
	28	bed_lines = ''
	29	for genome_loc in args.coords:
	30
	31	chrom, g_range = genome_loc.split(':')
	32	g_start, g_end = g_range.split('-')
	33
	34	if (g_start > g_end):
	35	g_start, g_end = g_end, g_start
	36
	37	g_start, g_end = int(g_start), int(g_end)
	38	g_start -= 1
	39
	40	bed_lines += '%s\t%d\t%d\n' % (chrom, g_start, g_end)
	41
	42	bed_p = Popen(bed_cmd, stdout=PIPE, stdin=PIPE, stderr=STDOUT, shell=True)
	43	out, err = bed_p.communicate(input=bed_lines)
	44
	45	for line in out.split('\n'):
	46	if (line and line[0]=='>'):
	47	(chrom, start, stop) = re.search(r'>([^:]+):(\d+)\-(\d+)',line).groups()
	48	print line + " @C:%s" % (start)
	49	elif (line):
	50	print line
	51
	52

+74

-0

scripts/get_protein.py less more

	0	#!/usr/bin/python
	1
	2	## get_protein.py --
	3	## get a protein sequence from Uniprot or NCBI/Refseq using the accession
	4	##
	5
	6	import sys
	7	import re
	8	import textwrap
	9	from urllib2 import urlopen
	10
	11	ncbi_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?"
	12	uniprot_url = "https://www.uniprot.org/uniprot/"
	13
	14	sub_range = ''
	15	for acc in sys.argv[1:]:
	16
	17	if (re.search(r':',acc)):
	18	(acc, sub_range) = acc.split(':')
	19
	20	if (re.match(r'^(sp\|tr\|iso\|ref)\\|',acc)):
	21	acc=acc.split('\|')[1]
	22
	23	if (re.match(r'[NX]P_',acc)):
	24	db_type="protein"
	25
	26	seq_args = "db=%s&id=" % (db_type) + ",".join(sys.argv[1:]) + "&rettype=fasta"
	27	seq_html = urlopen(ncbi_url + seq_args).read()
	28	else:
	29	seq_html = urlopen(uniprot_url + acc + ".fasta").read()
	30
	31	header=''
	32	seq = ''
	33	for line in seq_html.split('\n'):
	34	if (line and line[0]=='>'):
	35	# print out old one if there
	36	if (header):
	37	if (sub_range):
	38	start, stop = sub_range.split('-')
	39	start, stop = int(start), int(stop)
	40	if (start > 0):
	41	start -= 1
	42	new_seq = seq[start:stop]
	43	else:
	44	start = 0
	45	new_seq = seq
	46
	47	if (start > 0):
	48	print "%s @C%d" %(header, start+1)
	49	else:
	50	print header
	51	print '\n'.join(textwrap.wrap(new_seq))
	52
	53	header = line;
	54	seq = ''
	55	else:
	56	seq += line
	57
	58	start=0
	59	if (sub_range):
	60	start, stop = sub_range.split('-')
	61	start, stop = int(start), int(stop)
	62	if (start > 0):
	63	start -= 1
	64	new_seq = seq[start:stop]
	65	else:
	66	new_seq = seq
	67
	68	if (start > 0):
	69	print "%s @C:%d" %(header, start+1)
	70	else:
	71	print header
	72
	73	print '\n'.join(textwrap.wrap(new_seq))

+17

-0

scripts/get_refseq.py less more

	0	#!/usr/bin/python
	1
	2	import sys
	3	import re
	4	from urllib2 import urlopen
	5
	6
	7	db_type="protein"
	8	if (re.match(r'[NX]M_',sys.argv[1])):
	9	db_type="nucleotide"
	10
	11	seq_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?"
	12	seq_args = "db=%s&id=" % (db_type) + ",".join(sys.argv[1:]) + "&rettype=fasta"
	13
	14	seq_html = urlopen(seq_url + seq_args).read()
	15
	16	print seq_html

+12

-0

scripts/get_uniprot.py less more

	0	#!/usr/bin/python
	1
	2	import sys
	3	from urllib import urlopen
	4
	5	ARGV = sys.argv[1:];
	6
	7	for acc in ARGV :
	8	url = "https://www.uniprot.org/uniprot/" + acc + ".fasta"
	9	# print url
	10	fa_seq = urlopen(url).read()
	11	print fa_seq

+51

-0

scripts/get_up_prot_iso_sql.py less more

	0	#!/usr/bin/python
	1
	2	import sys
	3	import re
	4	import textwrap
	5	import argparse
	6	import MySQLdb.cursors
	7
	8	from urllib2 import urlopen
	9
	10	ncbi_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?"
	11	uniprot_url = "https://www.uniprot.org/uniprot/"
	12
	13	db = MySQLdb.connect(db='uniprot', host='xdb', user='web_user', passwd='fasta_www',
	14	cursorclass=MySQLdb.cursors.DictCursor)
	15
	16	cur1 = db.cursor()
	17	cur2 = db.cursor()
	18	get_iso_acc='select acc from annot2_iso where prim_acc="%s"'
	19	get_fasta_info='select db, acc, id, descr, seq from annot2 join protein using(acc) where acc="%s"'
	20	get_iso_fasta_info='select db, acc, id, descr, seq from annot2_iso join protein_iso using(acc) where prim_acc="%s"'
	21
	22	fasta_seqs=[]
	23
	24	for acc in sys.argv[1:]:
	25
	26	if (re.search(r':',acc)):
	27	(acc, sub_range) = acc.split(':')
	28
	29	if (re.match(r'^(sp\|tr\|iso\|ref)\\|',acc)):
	30	acc=acc.split('\|')[1]
	31
	32	cur1.execute(get_fasta_info%(acc,))
	33	row = cur1.fetchone()
	34	if (row):
	35	fasta_seqs.append(row)
	36	else:
	37	sys.stderr.write("*error* %s sequence not found\n"%(acc))
	38	continue
	39
	40	cur2.execute(get_iso_fasta_info%(acc,))
	41	for row in cur2:
	42	fasta_seqs.append(row)
	43
	44	for row in fasta_seqs:
	45	print ">%s\|%s\|%s %s"%(row['db'],row['acc'],row['id'],row['descr'])
	46	print '\n'.join(textwrap.wrap(row['seq']))
	47
	48
	49
	50

+2

-1

scripts/lav2plt.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	# lav2plt.pl - produce plotfrom lav output */
3	3

21	21	# governing permissions and limitations under the License.
22	22	################################################################
23	23
	24	use warnings;
24	25	use strict;
25	26	use Getopt::Long;
26	27	use Pod::Usage;

+5

-0

scripts/lavplt_ps.pl less more

	0	#!/usr/bin/env perl
	1	#
0	2	################################################################
1	3	# copyright (c) 2012, 2014 by William R. Pearson and The Rector &
2	4	# Visitors of the University of Virginia */

13	15	# express or implied. See the License for the specific language
14	16	# governing permissions and limitations under the License.
15	17	################################################################
	18
	19	use warnings;
	20	use strict;
16	21
17	22	#define SX(x) (int)((double)(x)*fxscal+fxoff+24)
18	23	sub SX {

+5

-1

scripts/lavplt_svg.pl less more

0
	0	#!/usr/bin/env perl
	1	#
1	2	################################################################
2	3	# copyright (c) 2012, 2014 by William R. Pearson and The Rector &
3	4	# Visitors of the University of Virginia */

15	16	# governing permissions and limitations under the License.
16	17	################################################################
17	18
	19	use warnings;
	20	use strict;
	21
18	22	#define SX(x) (int)((double)(x)*fxscal+fxoff+6)
19	23	sub SX {
20	24	my $xx = shift;

+2

-1

scripts/links2sql.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014 by William R. Pearson and The Rector &

16	16	# governing permissions and limitations under the License.
17	17	################################################################
18	18
	19	use warnings;
19	20	use strict;
20	21	use DBI;
21	22	use Getopt::Long;

+2

-1

scripts/m8_btop_msa.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014,2015 by William R. Pearson and The Rector &

39	39	#
40	40	################################################################
41	41
	42	use warnings;
42	43	use strict;
43	44	use IPC::Open2;
44	45	use Pod::Usage;

+2

-1

scripts/m9B_btop_msa.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014,2015 by William R. Pearson and The Rector &

36	36	#
37	37	################################################################
38	38
	39	use warnings;
39	40	use strict;
40	41	use IPC::Open2;
41	42	use Pod::Usage;

+488

-0

scripts/map_exon_coords.py less more

	0	#!/usr/bin/env python
	1	#
	2	# given a -m8CB file with exon annotations for the query and subject,
	3	# provide a function that maps subject coordinates to query, or vice versa
	4
	5	################################################################
	6	# copyright (c) 2018 by William R. Pearson and The Rector &
	7	# Visitors of the University of Virginia */
	8	################################################################
	9	# Licensed under the Apache License, Version 2.0 (the "License");
	10	# you may not use this file except in compliance with the License.
	11	# You may obtain a copy of the License at
	12	#
	13	# http://www.apache.org/licenses/LICENSE-2.0
	14	#
	15	# Unless required by applicable law or agreed to in writing,
	16	# software distributed under this License is distributed on an "AS
	17	# IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either
	18	# express or implied. See the License for the specific language
	19	# governing permissions and limitations under the License.
	20	################################################################
	21
	22	import fileinput
	23	import sys
	24	import re
	25	import argparse
	26	import copy
	27
	28	################
	29	# "domain" class that describes a domain/exon alignment annotation
	30	#
	31	class exonInfo:
	32	def __init__(self, name, q_target, p_start, p_end, chrom, d_start, d_end, full_text):
	33	self.name = name
	34	self.q_target = q_target
	35	self.p_start = p_start
	36	self.p_end = p_end
	37	self.chrom = chrom
	38	self.d_start = d_start
	39	self.d_end = d_end
	40	self.text = full_text
	41	self.plus_strand = True
	42	if (d_start > d_end):
	43	self.plus_strand = False
	44
	45	def __str__(self):
	46	rxr_str = "XD"
	47	if (self.q_target):
	48	rxr_str="DX"
	49	return '\|%s:%i-%i:%s{%s:%i-%i}' % (rxr_str, self.p_start, self.p_end, self.name, self.chrom, self.d_start, self.d_end)
	50
	51	class exonAlign:
	52	def __init__(self, name, q_target, qp_start, qp_end, sp_start, sp_end, full_text):
	53	self.exon = None
	54
	55	self.name = name
	56	self.q_target = q_target
	57
	58	self.q_start = qp_start
	59	self.q_end = qp_end
	60	self.s_start = sp_start
	61	self.s_end = sp_end
	62
	63	self.text = full_text
	64	self.out_str = ''
	65
	66	def __str__(self):
	67	rxr_str = "RX"
	68	if (self.q_target):
	69	rxr_str="XR"
	70	return "[%s:%i-%i:%i-%i::%s" % (rxr_str,self.q_start, self.q_end, self.s_start, self.s_end, self.name)
	71
	72	def print_bar_str(self): # checking for 'NADA'
	73	if (not self.out_str):
	74	self.out_str = self.text
	75	return str("\|%s"%(self.out_str))
	76
	77	# Parses domain annotations after split at '\|'
	78
	79	#
	80	def parse_exon_align(text):
	81	# takes a domain in string form, turns it into a domain object
	82	# looks like: RX:5-82:5-82:s=397;b=163.1;I=1.000;Q=453.6;C=C.Thioredoxin~1
	83	# could also look like: RX:5-82:5-82:s=397;b=163.1;I=1.000;Q=453.6;C=C.Thioredoxin{PF012445}~1
	84
	85	# get RX/XR and qstart/qstop sstart/sstop as strings
	86	m = re.search(r'^(\w+):(\d+)-(\d+):(\d+)-(\d+):',text)
	87	if (m):
	88	(RXRState, qstart_s, qend_s, sstart_s, send_s) = m.groups()
	89	else:
	90	sys.stderr.write("could not parse exon location: %s\n"%(text))
	91
	92	# get domain name/color (and possibly {info})
	93
	94	(name, color_s) = re.search(r';C=([^~]+)~(.+)$',text).groups()
	95	info_s=""
	96
	97	if (re.search(r'\}$',name)):
	98	(name, info_s) = re.search(r'([^\{]+)(\{[^\}]+\})$',name).groups()
	99
	100	q_target = True
	101	if (RXRState=='XR'):
	102	q_target = False
	103
	104	exon_align = exonAlign(name, q_target, int(qstart_s), int(qend_s), int(sstart_s), int(send_s),
	105	text)
	106
	107	return exon_align
	108
	109	################
	110	# exon_info is like domain, but no scores
	111	#
	112	def parse_exon_info(text):
	113	# takes a domain in string form, turns it into a domain object
	114	# looks like: DX:1-100;C=C.Thioredoxin~1
	115
	116	(RXRState, start_s, end_s,name, color) = re.search(r'^(\w+):(\d+)-(\d+);C=([^~]+)~(.*)$',text).groups()
	117	info = ""
	118	if (re.search(r'\}$',name)):
	119	(name, info) = re.search(r'([^\{]+)(\{[^\}]+\})$',name).groups()
	120
	121	gene_re = re.search(r'^\{(\w+):(\d+)\-(\d+)\}',info)
	122	if (gene_re):
	123	(chrom, d_start, d_end) = gene_re.groups()
	124	else:
	125	sys.stderr.write("genome info not found: %s\n" % (text))
	126
	127	q_target = True;
	128	if (RXRState == 'XD'):
	129	q_target = False
	130
	131	exon_info = exonInfo(name, q_target, int(start_s), int(end_s), chrom, int(d_start), int(d_end), text)
	132
	133	return exon_info
	134
	135	####
	136	# parse_protein(result_line)
	137	# takes a protein in string format, turns it into a dictionary properly
	138	# looks like: sp\|P30711\|GSTT1_HUMAN up\|Q2NL00\|GSTT1_BOVIN 86.67 240 32 0 1 240 1 240 1.4e-123 444.0 16VI7DR6IT3IR15KQ3AI6TI11TA7YH8RC12TA3SN10FL10QETM2AT6VMTA2LV2DG4ND6PS24EK6TA11DV14FSPQ5IL3LMML1WK5RQ \|XR:4-76:4-76:s=327;b=134.6;I=0.895;Q=367.8;C=C.Thioredoxin~1\|RX:5-82:5-82:s=356;b=146.5;I=0.902;Q=403.3;C=C.Thioredoxin~1\|RX:83-93:83-93:s=52;b=21.4;I=0.818;Q=30.9;C=NODOM~0\|XR:77-93:77-93:s=86;b=35.4;I=0.882;Q=72.6;C=NODOM~0\|RX:94-110:94-110:s=88;b=36.2;I=0.882;Q=75.0;C=vC.GST_C~2v\|XR:94-110:94-110:s=88;b=36.2;I=0.882;Q=75.0;C=vC.GST_C~2v\|RX:111-201:111-201:s=409;b=168.3;I=0.868;Q=468.3;C=C.GST_C~2\|XR:111-201:111-201:s=409;b=168.3;I=0.868;Q=468.3;C=C.GST_C~2\|RX:202-240:202-240:s=154;b=63.4;I=0.795;Q=155.9;C=NODOM~0\|XR:202-240:202-240:s=154;b=63.4;I=0.795;Q=155.9;C=NODOM~0
	139	#
	140	def parse_protein(line_data,fields, req_name):
	141	# last part (domain annotions) split('\|') and parsed by parse_domain()
	142
	143	data = {}
	144	data = dict(zip(fields, line_data))
	145	if (re.search(r'\\|',data['qseqid'])):
	146	data['qseq_acc'] = data['qseqid'].split('\|')[1]
	147	else:
	148	data['qseq_acc'] = data['qseqid']
	149
	150	if (re.search(r'\\|',data['sseqid'])):
	151	data['sseq_acc'] = data['sseqid'].split('\|')[1]
	152	else:
	153	data['sseq_acc'] = data['sseqid']
	154
	155	Qexon_list = []
	156	Sexon_list = []
	157
	158	Qinfo_list = []
	159	Sinfo_list = []
	160
	161	counter = 0
	162
	163	if ('align_annot' in data and len(data['align_annot']) > 0):
	164	for exon_str in data['align_annot'].split('\|')[1:]:
	165	if (req_name and not re.search(req_name, exon_str)):
	166	continue
	167
	168	counter += 1
	169	exon = parse_exon_align(exon_str)
	170	if (exon.q_target):
	171	Qexon_list.append(exon)
	172	else:
	173	Sexon_list.append(exon)
	174
	175	data['q_exalign_list'] = Qexon_list
	176	data['s_exalign_list'] = Sexon_list
	177
	178	if ('exon_info' in data and len(data['exon_info']) > 0):
	179	for info_str in data['exon_info'].split('\|')[1:]:
	180	if (not re.search(r'^[DX][XD]',info_str)):
	181	continue
	182
	183	dinfo = parse_exon_info(info_str)
	184
	185	if (dinfo.q_target):
	186	Qinfo_list.append(dinfo)
	187	else:
	188	Sinfo_list.append(dinfo)
	189
	190
	191	# put links to info_list into exon_list so info_list names can
	192	# be changed -- give S/Qinfo's the S/Qdom ids of the overlapping domain
	193
	194	# find_info_overlaps(Qinfo_list, Qexon_list)
	195	# find_info_overlaps(Sinfo_list, Sexon_list)
	196
	197	data['q_exinfo_list'] = Qinfo_list
	198	data['s_exinfo_list'] = Sinfo_list
	199
	200	return data
	201
	202	################
	203	#
	204	# decode_btop() -
	205	# input: a blast BTOP string of the form: "1VA160TS7KG10RK27"
	206	# returns a list_ref of tokens: (1, "VA", 60, "TS", 7, "KG, 10, "RK", 27)
	207	def decode_btop(btop_str):
	208	out_tokens = []
	209	for token in re.split(r'(\d+)',btop_str):
	210	if (not token): continue
	211	if re.match(r'\d+',token):
	212	out_tokens.append(token)
	213	else:
	214	for mismat in re.split(r'(..)',token):
	215	if (mismat): out_tokens.append(mismat)
	216
	217	return out_tokens
	218
	219	################
	220	#
	221	# map_align(btop, q_start, s_start)
	222	# input: btop
	223	# output: q_pos_arr, s_pos_arr
	224	#
	225	def map_align(btop_str, q_start, s_start):
	226
	227	q_pos = q_start
	228	s_pos = s_start
	229
	230	q_pos_arr = []
	231	s_pos_arr = []
	232
	233	btop_tokens = decode_btop(btop_str)
	234
	235	for t in btop_tokens:
	236	if (re.match(r'\d+',t)):
	237	for i in range(int(t)) :
	238	q_pos_arr.append(q_pos)
	239	q_pos += 1
	240	s_pos_arr.append(s_pos)
	241	s_pos += 1
	242	elif (re.match(r'\-\w',t)):
	243	q_pos_arr.append(q_pos)
	244	s_pos_arr.append(s_pos)
	245	s_pos += 1
	246	elif (re.match(r'\w\-',t)):
	247	q_pos_arr.append(q_pos)
	248	q_pos += 1
	249	s_pos_arr.append(s_pos)
	250	else:
	251	q_pos_arr.append(q_pos)
	252	q_pos += 1
	253	s_pos_arr.append(s_pos)
	254	s_pos += 1
	255
	256	return q_pos_arr, s_pos_arr
	257
	258	################
	259	#
	260	# map_coords(from_coords, to_coords, coord_list)
	261	#
	262	def map_coords(from_coords, to_coords, coord_list):
	263
	264	mapped_coords = []
	265
	266	fx = 0
	267	mx = 0
	268	while mx < len(coord_list):
	269	this_from_coord = coord_list[mx]
	270	while (from_coords[fx] < this_from_coord):
	271	fx += 1
	272	continue
	273
	274	mapped_coords.append(to_coords[fx])
	275	mx += 1
	276
	277	return mapped_coords
	278
	279	################
	280	#
	281	# map_align_coords() given a BTOP, q_start, s_start, and s_target, generate s_coords for list of q_coords
	282	#
	283	def map_align_coords(btop_str, q_start, s_start, s_target, coord_list):
	284
	285	(q_coords, s_coords) = map_align(btop_str, q_start, s_start)
	286
	287	sorted_coord_list = sorted(coord_list)
	288
	289	if (s_target):
	290	s_mapped_coords = map_coords(q_coords, s_coords, sorted_coord_list)
	291	else:
	292	s_mapped_coords = map_coords(s_coords, q_coords, sorted_coord_list)
	293
	294	coord_dict={}
	295	for ix, s_coord in enumerate(sorted_coord_list):
	296	coord_dict[s_coord]=s_mapped_coords[ix]
	297
	298	return [ coord_dict[c] for c in coord_list ]
	299
	300
	301	################
	302	#
	303	# aa_to_exon() --- given a coordinate and the corresponding exon map, return the exon coordinate
	304	# (can only be done for aligned exons)
	305	#
	306	# this version of the function must use an info_list, not an
	307	# align_list, because it uses p_start/p_end rather than qp_start/sp_start, etc.
	308	# a version using qp_start/sp_start would also need a target argument
	309	#
	310	def aa_to_exon(aa_coords, exon_info_list):
	311
	312	sorted_aa_coords = sorted(aa_coords)
	313
	314	pos_strand = True
	315	if (exon_info_list[0].d_start > exon_info_list[0].d_end):
	316	pos_strand = False
	317
	318	ex_x = 0
	319	exon_coords = []
	320
	321	aap_x = 0
	322	this_aap = sorted_aa_coords[aap_x]
	323	while (ex_x < len(exon_info_list)):
	324	this_exon = exon_info_list[ex_x]
	325	if (this_aap <= this_exon.p_end and this_aap >= this_exon.p_start):
	326	aa_dna_offset = (this_aap - this_exon.p_start) * 3
	327
	328	if (pos_strand):
	329	aa_dna_pos = this_exon.d_start + aa_dna_offset
	330	else:
	331	aa_dna_pos = this_exon.d_start - aa_dna_offset
	332
	333	exon_coords.append({'chrom':this_exon.chrom, 'dpos':aa_dna_pos})
	334	aap_x += 1
	335	if (aap_x < len(sorted_aa_coords)):
	336	this_aap = sorted_aa_coords[aap_x]
	337	else:
	338	break
	339	else:
	340	ex_x += 1
	341
	342	aa_coord_dict = {}
	343	for aap_x, aap in enumerate(sorted_aa_coords):
	344	aa_coord_dict[aap] = exon_coords[aap_x]
	345
	346	return [aa_coord_dict[ax] for ax in aa_coords]
	347
	348	################
	349	# set_data_fields() -- initialize field[] used to generate data[] dict
	350	#
	351	def set_data_fields(args, line_data) :
	352
	353	field_str = 'qseqid sseqid pident length mismatch gapopen q_start q_end s_start s_end evalue bitscore BTOP align_annot'
	354	field_qs_str = 'qseqid q_len sseqid s_len pident length mismatch gapopen q_start q_end s_start s_end evalue bitscore BTOP align_annot'
	355
	356	if (len(line_data) > 1) :
	357	if ((not args.have_qslen) and re.search(r'\d+',line_data[1])):
	358	args.have_qslen=True
	359
	360	if ((not args.exon_info) and re.search(r'^\\|[DX][XD]\:',line_data[-1])):
	361	args.exon_info = True
	362
	363	end_field = -1
	364	fields = field_str.split(' ')
	365
	366	if (args.have_qslen):
	367	fields = field_qs_str.split(' ')
	368
	369	if (args.exon_info):
	370	fields.append('exon_info')
	371	end_field = -2
	372
	373	return (fields, end_field)
	374
	375	################################################################
	376	#
	377	# main program
	378	# print "#"," ".join(sys.argv)
	379
	380	def main():
	381
	382	data_fields_reset=False
	383
	384	parser=argparse.ArgumentParser(description='map_exon_coords.py result_file.m8CB saa:coord : map subject coordinate to query genomic coordinate')
	385	parser.add_argument('--have_qslen', help='bl_tab fields include query/subject lengths',dest='have_qslen',action='store_true',default=False)
	386	parser.add_argument('--exon_info', help='raw domain coordinates included',action='store_true',default=True)
	387	parser.add_argument('--subj_aa',help='subject aa coordinate to map',action='store',type=int,dest='subj_aa_coord',default=1)
	388	parser.add_argument('files', metavar='FILE', nargs='*', help='files to read, if empty, stdin is used')
	389	args=parser.parse_args()
	390
	391	end_field = -1
	392	data_fields_reset=False
	393
	394	(fields, end_field) = set_data_fields(args, [])
	395
	396	if (args.have_qslen and args.exon_info):
	397	data_fields_reset=True
	398
	399	saved_qexon_list = []
	400	qexon_list = []
	401
	402	for line in fileinput.input(args.files):
	403	# pass through comments
	404	if (line[0] == '#'):
	405	print line, # ',' because have not stripped
	406	continue
	407
	408	################
	409	# break up tab fields, check for extra fields
	410	line = line.strip('\n')
	411	line_data = line.split('\t')
	412	if (not data_fields_reset): # look for --have_qslen number, --exon_info data, even if not set
	413	(fields, end_field) = set_data_fields(args, line_data)
	414	data_fields_reset = True
	415
	416	################
	417	# get exon annotations
	418	# produces: data['q_exalign_list'], data['s_exalign_list']
	419	# data['q_exinfo_list'], data['s_exinfo_list']
	420	data = parse_protein(line_data,fields,"exon") # get score/alignment/domain data
	421
	422	# extract aligned query_coordinates
	423	q_coords = []
	424	sa_from_qa = []
	425	for q_ex in data['q_exalign_list']:
	426	q_coords.append(q_ex.q_start)
	427	q_coords.append(q_ex.q_end)
	428	sa_from_qa.append(q_ex.s_start)
	429	sa_from_qa.append(q_ex.s_end)
	430
	431	s_coords = []
	432	qa_from_sa = []
	433	for s_ex in data['s_exalign_list']:
	434	s_coords.append(s_ex.s_start)
	435	s_coords.append(s_ex.s_end)
	436	qa_from_sa.append(s_ex.q_start)
	437	qa_from_sa.append(s_ex.q_end)
	438
	439	################
	440	# map aligned coordinates in query to subject exons
	441	# -- this is not necessary -- it already in data['q_exalign_list'].s_start/s_end
	442	# s_target=True
	443	# sa_from_qa = map_align_coords(data['BTOP'], int(data['q_start']), int(data['s_start']),
	444	# s_target, qa_coords)
	445	sex_from_qa2sa = aa_to_exon(sa_from_qa, data['s_exinfo_list'])
	446	qex_from_sa2qa = aa_to_exon(qa_from_sa, data['q_exinfo_list'])
	447
	448
	449	################
	450	# print out non-exon info
	451
	452	print '\t'.join([str(data[x]) for x in fields[:end_field]]),
	453
	454	################
	455	# edit the full text to insert the other aligned coordinates
	456	# (also re-order the regions query-first, then subject
	457	# for 'q_exalign_list', I need to add the subj_genome_coords sex_from_qa2sa
	458	# and they need to be second
	459	# for 's_exalign_list', I need to add the query_genome_coords from qex_from_sa2qa
	460	# and they need to be first
	461
	462	q_exalign_out=[]
	463	for qx, q_exon in enumerate(data['q_exalign_list']):
	464	sg_start = sex_from_qa2sa[2*qx]
	465	sg_end = sex_from_qa2sa[2*qx+1]
	466	sg_replace="::%s:%d-%d}"%(sg_start['chrom'],sg_start['dpos'],sg_end['dpos'])
	467
	468	this_outstr=re.sub(r'\}',sg_replace,q_exon.text)
	469	q_exalign_out.append(this_outstr)
	470
	471	s_exalign_out=[]
	472	for sx, s_exon in enumerate(data['s_exalign_list']):
	473	qg_start = qex_from_sa2qa[2*sx]
	474	qg_end = qex_from_sa2qa[2*sx+1]
	475	qg_replace="{%s:%d-%d::"%(qg_start['chrom'],qg_start['dpos'],qg_end['dpos'])
	476
	477	this_outstr=re.sub(r'\{',qg_replace,s_exon.text)
	478	s_exalign_out.append(this_outstr)
	479
	480	print "\t\|"+"\|".join(q_exalign_out+s_exalign_out)+"\t"+line_data[-1]
	481
	482	################
	483	# run the program ...
	484
	485	if __name__ == '__main__':
	486	main()
	487

+317

-0

scripts/merge_blast_btab.pl less more

	0	#!/usr/bin/env perl
	1
	2	################################################################
	3	# copyright (c) 2018 by William R. Pearson and The Rector &
	4	# Visitors of the University of Virginia */
	5	################################################################
	6	# Licensed under the Apache License, Version 2.0 (the "License");
	7	# you may not use this file except in compliance with the License.
	8	# You may obtain a copy of the License at
	9	#
	10	# http://www.apache.org/licenses/LICENSE-2.0
	11	#
	12	# Unless required by applicable law or agreed to in writing,
	13	# software distributed under this License is distributed on an "AS
	14	# IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either
	15	# express or implied. See the License for the specific language
	16	# governing permissions and limitations under the License.
	17	################################################################
	18
	19	################################################################
	20	# merge_blast_btab.pl --btab .btab file html_file
	21	################################################################
	22
	23	use warnings;
	24	use strict;
	25	use Getopt::Long;
	26	use Pod::Usage;
	27	use URI::Encode qw(uri_encode);
	28	use URI::Escape qw(uri_escape);
	29
	30	my ($btab_file, $have_qslen, $help, $shelp, $dom_info) = ("", 0, 0, 0, 0);
	31	my ($plot_url) = ("");
	32
	33	GetOptions(
	34	"btab_file\|btab=s" => \$btab_file,
	35	"have_qslen\|have_sqlen!" => \$have_qslen,
	36	"domain_info\|dom_info!" => \$dom_info,
	37	"plot_url=s"=> \$plot_url,
	38	"h\|?" => \$shelp,
	39	"help" => \$help,
	40	);
	41
	42	pod2usage(1) if $shelp;
	43	pod2usage(exitstatus => 0, verbose => 2) if $help;
	44	unless (-f STDIN \|\| -p STDIN \|\| @ARGV) {
	45	pod2usage(1);
	46	}
	47
	48	# require a btab file
	49
	50	# read it in, save structure as list/hash on accession (list more robust)
	51	# what happens with multiple hits for same library -- need to add code
	52	#
	53
	54	my @bl_fields = qw(q_seqid s_seqid percid alen mismatch gopen q_start q_end s_start s_end evalue bits score annot);
	55
	56	if ($have_qslen) {
	57	@bl_fields = qw(q_seqid q_len s_seqid s_len percid alen mismatch gopen q_start q_end s_start s_end evalue bits score annot);
	58	}
	59
	60	if ($dom_info) {
	61	push @bl_fields, "dom_info";
	62	}
	63
	64	my %tab_data = ();
	65	my @sseq_ids = ();
	66
	67	unless ($btab_file) {
	68	die "--btab_file required"
	69	}
	70	else {
	71	# read in btab file
	72	open(my $fd, $btab_file) \|\| die "cannot open $btab_file";
	73
	74	while (my $line = <$fd>) {
	75	next if ($line =~ m/^#/); # ignore comments
	76	chomp($line);
	77	my %a_data = ();
	78	@a_data{@bl_fields} = split(/\t/,$line);
	79
	80	# here we should confirm that the sseqid is new. If it is not, then add to a list.
	81	my $sseqid = $a_data{'s_seqid'};
	82
	83	if (defined($tab_data{$sseqid})) {
	84	push @{$tab_data{$sseqid}}, \%a_data
	85	}
	86	else {
	87	$tab_data{$sseqid} = [ \%a_data ];
	88	push @sseq_ids, $sseqid;
	89	}
	90	}
	91	}
	92
	93	# have the annotation data in %tab_data{} and @seq_ids
	94	# read in the blastp html file and annotate it
	95
	96	my ($in_best, $in_align) = (0,0);
	97	my ($best_ix, $align_ix, $hsp_ix) = (0,0,0);
	98
	99	while (my $line = <>) {
	100	chomp($line);
	101	unless ($line) {
	102	print "\n";
	103	next;
	104	}
	105	if ($line =~ m/^Sequences producing/) {
	106	$in_best = 1;
	107	$best_ix = 0;
	108	print "$line\n";
	109	next;
	110	}
	111
	112	if ($in_best) {
	113	if ($line =~ /^>/) {
	114	$in_best = 0;
	115	$in_align = 1;
	116	$align_ix = 0;
	117	$hsp_ix = 0;
	118	# print out the first line
	119	print "$line\n";
	120	next;
	121	}
	122	else {
	123	$line = add_best($line, $tab_data{$sseq_ids[$best_ix]}->[0]);
	124	$best_ix++;
	125	}
	126	}
	127
	128	if ($in_align) {
	129	if ($line =~ m/^\s+Score = \d+/) { # have Length= match, put out annotations if available
	130	my $regions_str = regions_to_str($tab_data{$sseq_ids[$align_ix]}->[$hsp_ix]);
	131	print $regions_str;
	132
	133	if ($plot_url) {
	134	my $raw_dom_str = "";
	135	if ($dom_info) {
	136	$raw_dom_str = dom_info_str($tab_data{$sseq_ids[$align_ix]}->[$hsp_ix]{'dom_info'});
	137	}
	138
	139	my $plot_tag = plot_tag_str($plot_url, $tab_data{$sseq_ids[$align_ix]}->[$hsp_ix], $regions_str, $raw_dom_str);
	140	if ($plot_tag) {print $plot_tag,"\n";}
	141	}
	142
	143	$hsp_ix++;
	144
	145	}
	146	elsif ($line =~ m/^>/) {
	147	$align_ix++;
	148	$hsp_ix = 0;
	149	}
	150	}
	151
	152	print "$line\n";
	153	}
	154
	155	sub parse_annots {
	156	my ($annot_str) = @_;
	157
	158	my @annot_list = ();
	159
	160	unless ($annot_str && $annot_str =~ m/^\\|/) {
	161	return \@annot_list;
	162	}
	163
	164	my @annots = split('\\|',$annot_str);
	165	shift @annots;
	166
	167	for my $annot ( @annots ) {
	168	my %annot_data = ();
	169	next unless ($annot =~ m/^[XR][RX]/);
	170	my @a_fields = split(/;/,$annot);
	171	for my $f (@a_fields) {
	172	if ($f =~ m/^[XR][XR]/) {
	173	my @a2_f = split(':',$f);
	174	if ($a2_f[0] =~ m/^XR/) {
	175	$annot_data{target} = 'subj';
	176	}
	177	else {
	178	$annot_data{target} = 'query';
	179	}
	180	$annot_data{coord} = "$a2_f[1]:$a2_f[2]";
	181	$annot_data{score} = (split('=',$a2_f[3]))[1]
	182	}
	183	elsif ($f =~ m/(\w)=(.+)/) {
	184	$annot_data{$1} = $2;
	185	}
	186	}
	187	$annot_data{name} = $a_fields[-1];
	188	$annot_data{name} =~ s/^C=//;
	189	push @annot_list, \%annot_data;
	190	}
	191	return \@annot_list;
	192	}
	193
	194	sub regions_to_str {
	195	my ($a_data_r) = @_;
	196
	197	my $annot_ref = parse_annots($a_data_r->{annot});
	198
	199	my $region_str = "";
	200	my $annot_str = "";
	201
	202	for my $annot ( @{$annot_ref}) {
	203	if ($annot->{target} =~ m/^q/) {
	204	$region_str = "qRegion";
	205	}
	206	else {
	207	$region_str = " Region";
	208	}
	209
	210	$annot_str .= sprintf "%s: %s : score=%d; bits=%.1f; Id=%.3f; Q=%.1f : %s\n", $region_str,
	211	@{$annot}{qw(coord score b I Q name)};
	212	}
	213	return $annot_str;
	214	}
	215
	216	sub add_best {
	217	my ($line, $a_data) = @_;
	218
	219	my $annot_str = '';
	220
	221	my $annot_refs = parse_annots($a_data->{annot});
	222
	223	for my $annot ( @$annot_refs) {
	224	if ($annot->{target} !~ m/^q/) {
	225	$annot_str .= $annot->{name} . ";"
	226	}
	227	}
	228
	229	if ($annot_str) {
	230	return "$line $annot_str";
	231	}
	232	else {
	233	return $line;
	234	}
	235	}
	236
	237	sub plot_tag_str {
	238
	239	my ($plot_script, $align_data_r, $regions_str, $doms_str) = @_;
	240
	241	my $svg_pref = q(<object type="image/svg+xml" );
	242	my $svg_post = q( width="660" height="76" ></object>);
	243
	244	#build argument string
	245	my %plt_args = ();
	246	@plt_args{qw(q_cstart l_cstart)} = (1, 1);
	247	@plt_args{qw(q_name q_cstop q_astart q_astop l_name l_cstop l_astart l_astop)} =
	248	@{$align_data_r}{qw(q_seqid q_len q_start q_end s_seqid s_len s_start s_end)};
	249	$plt_args{'regions'}= uri_escape(uri_encode($regions_str));
	250	if ($doms_str) {
	251	$plt_args{'doms'} = uri_encode($doms_str);
	252	}
	253
	254	my $dom_info = ();
	255
	256	my @args = map {"$_=$plt_args{$_}"} keys(%plt_args);
	257
	258	return $svg_pref . qq( data="$plot_url?) . join('&',@args) . '"' . $svg_post;
	259	}
	260
	261	sub dom_info_str {
	262	my ($raw_dom_info) = @_;
	263
	264	my $dom_str = "";
	265
	266	unless ($raw_dom_info) { return "";}
	267
	268	my @raw_doms = split('\\|',$raw_dom_info);
	269	shift(@raw_doms);
	270
	271	for my $dom ( @raw_doms ) {
	272	my $tmp_dom = $dom;
	273	$tmp_dom =~ s/^DX:/qDomain:\t/g;
	274	$tmp_dom =~ s/^XD:/lDomain:\t/g;
	275	$tmp_dom =~ s/;C=/\t/g;
	276
	277	$dom_str .= "$tmp_dom\n";
	278	}
	279
	280	return $dom_str;
	281	}
	282
	283
	284	__END__
	285
	286	=pod
	287
	288	=head1 NAME
	289
	290	merge_blast_btab.pl
	291
	292	=head1 SYNOPSIS
	293
	294	merge_blast_btab.pl --btab_file=result.b_tab result.html
	295
	296	=head1 OPTIONS
	297
	298	-h short help
	299	--help include description
	300
	301	--btab_file\|--btab file_name -- blast tabular output file with
	302	sub-alignment scoring
	303
	304	=head1 DESCRIPTION
	305
	306	C<merge_blast_btab.pl> merges the domain annotations and sub-alignment scoring from C<annot_blast_btop2.pl> blast tabular output file with a conventional blast result file.
	307
	308	The tab file is read and parsed, and then the subject/query seqid is used to
	309	capture domain locations in the subject/query sequence. If the domains
	310	overlap the aligned region, the domain names are appended to the output.
	311
	312	=head1 AUTHOR
	313
	314	William R. Pearson, wrp@virginia.edu
	315
	316	=cut

+380

-0

scripts/merge_fasta_btab.pl less more

	0	#!/usr/bin/env perl
	1
	2	################################################################
	3	# copyright (c) 2018 by William R. Pearson and The Rector &
	4	# Visitors of the University of Virginia */
	5	################################################################
	6	# Licensed under the Apache License, Version 2.0 (the "License");
	7	# you may not use this file except in compliance with the License.
	8	# You may obtain a copy of the License at
	9	#
	10	# http://www.apache.org/licenses/LICENSE-2.0
	11	#
	12	# Unless required by applicable law or agreed to in writing,
	13	# software distributed under this License is distributed on an "AS
	14	# IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either
	15	# express or implied. See the License for the specific language
	16	# governing permissions and limitations under the License.
	17	################################################################
	18
	19	################################################################
	20	# merge_fasta_btab.pl --btab .btab file html_file
	21	################################################################
	22
	23	################################################################
	24	# takes a standard (or <html> output FASTA file and converts (or adds) labels using .btab information
	25	################################################################
	26
	27
	28	use warnings;
	29	use strict;
	30	use Getopt::Long;
	31	use Pod::Usage;
	32	use URI::Encode qw(uri_encode);
	33	use URI::Escape qw(uri_escape);
	34
	35	my ($btab_file, $have_qslen, $help, $shelp, $dom_info) = ("", 0, 0, 0, 0);
	36	my ($plot_url) = ("");
	37
	38	GetOptions(
	39	"btab_file\|btab=s" => \$btab_file,
	40	"have_qslen\|have_sqlen" => \$have_qslen,
	41	"have_qslen\|have_sqlen!" => \$have_qslen,
	42	"domain_info\|dom_info!" => \$dom_info,
	43	"plot_url=s"=> \$plot_url,
	44	"h\|?" => \$shelp,
	45	"help" => \$help,
	46	);
	47
	48	pod2usage(1) if $shelp;
	49	pod2usage(exitstatus => 0, verbose => 2) if $help;
	50	unless (-f STDIN \|\| -p STDIN \|\| @ARGV) {
	51	pod2usage(1);
	52	}
	53
	54	# require a btab file
	55
	56	# read it in, save structure as list/hash on accession (list more robust)
	57	# what happens with multiple hits for same library -- need to add code
	58	#
	59
	60	my @bl_fields = qw(q_seqid s_seqid percid alen mismatch gopen q_start q_end s_start s_end evalue bits score annot);
	61
	62	if ($have_qslen) {
	63	@bl_fields = qw(q_seqid q_len s_seqid s_len percid alen mismatch gopen q_start q_end s_start s_end evalue bits score annot);
	64	}
	65
	66	my %pgm_names= ('FASTA'=>'fap', 'FASTX'=>'fx', 'FASTY'=>'fy', 'FASTS'=>'fs', 'FASTM'=>'fm',
	67	'SSEARCH' => 'gsw', 'GGSEARCH'=>'gnw', 'GLSEARCH'=>'lnw',
	68	'TFASTX' => 'tfx', 'TFASTY'=>'tfx', 'TFASTS'=>'tfs', 'TFASTM'=>'tfm',
	69	'BLASTP'=>'bp', 'BLASTN'=>'bn', 'TBLASTN'=>'tbn' );
	70
	71	if ($dom_info) {
	72	push @bl_fields, "dom_info";
	73	}
	74
	75	my $pgm_name = '';
	76	my %tab_data = ();
	77	my @sseq_ids = ();
	78
	79	unless ($btab_file) {
	80	die "--btab_file required"
	81	}
	82	else {
	83	# read in btab file
	84	open(my $fd, $btab_file) \|\| die "cannot open $btab_file";
	85
	86	while (my $line = <$fd>) {
	87	if ($line =~ m/^#/) { # check for program name
	88	if (!$pgm_name) {
	89	my ($name) = ($line =~ m/^# (\w+) /);
	90	if ($name && $pgm_names{$name}) {
	91	$pgm_name = $pgm_names{$name};
	92	}
	93	}
	94	next;
	95	}
	96	chomp($line);
	97
	98	my %a_data = ();
	99	@a_data{@bl_fields} = split(/\t/,$line);
	100
	101	# here we should confirm that the sseqid is new. If it is not, then add to a list.
	102	my $sseqid = $a_data{'s_seqid'};
	103
	104	if (defined($tab_data{$sseqid})) {
	105	push @{$tab_data{$sseqid}}, \%a_data
	106	}
	107	else {
	108	$tab_data{$sseqid} = [\%a_data ];
	109	push @sseq_ids, $sseqid;
	110	}
	111	}
	112	}
	113
	114	# have the annotation data in %tab_data{} and @seq_ids
	115	# read in the blastp html file and annotate it
	116
	117	my ($in_best, $in_align, $in_annot) = (0,0,0);
	118	my ($annot_id) = ("");
	119	my ($best_ix, $align_ix, $hsp_ix) = (0,0,0);
	120
	121	while (my $line = <>) {
	122	chomp($line);
	123	unless ($line) {
	124	print "\n";
	125	next;
	126	}
	127	if ($line =~ m/^The best scores are:/) {
	128	$in_best = 1;
	129	$best_ix = 0;
	130	print "$line\n";
	131	next;
	132	}
	133
	134	if ($in_best) {
	135	if ($line =~ /<pre>>>/) {
	136	$in_best = 0;
	137	$in_align = 1;
	138	$in_annot = 0;
	139	$align_ix = 0;
	140	$hsp_ix = 0;
	141	# print out the first line
	142	print "$line\n";
	143	next;
	144	}
	145	else {
	146	if (scalar(@sseq_ids) && $sseq_ids[$best_ix]) {
	147	$line = add_best($line, $tab_data{$sseq_ids[$best_ix]}->[0]);
	148	$best_ix++;
	149	}
	150	}
	151	}
	152
	153	if ($in_align) {
	154	if ($line =~ m/^<!\-\- ANNOT_START "([^"]+)" \-\->/) {
	155	$annot_id = $1;
	156	my $regions_str = regions_to_str($tab_data{$sseq_ids[$align_ix]}->[$hsp_ix]);
	157	print qq(<!-- ANNOT_START "$annot_id" -->);
	158	print $regions_str;
	159
	160	if ($plot_url) {
	161	my $raw_dom_str = "";
	162	if ($dom_info) {
	163	$raw_dom_str = dom_info_str($tab_data{$sseq_ids[$align_ix]}->[$hsp_ix]{'dom_info'});
	164	}
	165
	166	my $plot_tag = plot_tag_str($plot_url, $pgm_name, $tab_data{$sseq_ids[$align_ix]}->[$hsp_ix], $regions_str, $raw_dom_str);
	167	if ($plot_tag) {print $plot_tag,"\n";}
	168	}
	169
	170	$hsp_ix++;
	171
	172	# remove the old domain information */
	173	while ($line = <> ) {
	174	chomp($line);
	175	if ($line !~ m/^\s*q?Region:/ && $line !~ /ANNOT_STOP/) {
	176	print "$line\n";
	177	}
	178	if ($line =~ m/^<!\-\- ANNOT_STOP \-\->/) {
	179	last;
	180	}
	181	}
	182	}
	183	elsif ($line =~ m/<pre>>>/) {
	184	$align_ix++;
	185	$hsp_ix=0;
	186	}
	187	}
	188
	189	print "$line\n";
	190	}
	191
	192	sub parse_annots {
	193	my ($annot_str) = @_;
	194
	195	my @annot_list = ();
	196
	197	unless ($annot_str && $annot_str =~ m/^\\|/) {
	198	return \@annot_list;
	199	}
	200
	201	my @annots = split('\\|',$annot_str);
	202	shift @annots;
	203
	204	for my $annot ( @annots ) {
	205	my %annot_data = ();
	206	next unless ($annot =~ m/^[XR][RX]/);
	207	my @a_fields = split(/;/,$annot);
	208	for my $f (@a_fields) {
	209	if ($f =~ m/^[XR][XR]/) {
	210	my @a2_f = split(':',$f);
	211	if ($a2_f[0] =~ m/^XR/) {
	212	$annot_data{target} = 'subj';
	213	}
	214	else {
	215	$annot_data{target} = 'query';
	216	}
	217	$annot_data{coord} = "$a2_f[1]:$a2_f[2]";
	218	$annot_data{score} = (split('=',$a2_f[3]))[1]
	219	}
	220	elsif ($f =~ m/(\w)=(.+)/) {
	221	$annot_data{$1} = $2;
	222	}
	223	}
	224	$annot_data{name} = $a_fields[-1];
	225	$annot_data{name} =~ s/^C=//;
	226
	227	push @annot_list, \%annot_data;
	228	}
	229	return \@annot_list;
	230	}
	231
	232	sub print_regions {
	233	my ($annot_id, $annot_ref) = @_;
	234
	235	my $region_str = "";
	236
	237	print qq(<!-- ANNOT_START "$annot_id" -->);
	238
	239	for my $annot ( @{$annot_ref}) {
	240	if ($annot->{target} =~ m/^q/) {
	241	$region_str = "qRegion";
	242	}
	243	else {
	244	$region_str = " Region";
	245	}
	246
	247	printf "%s: %s : score=%d; bits=%.1f; Id=%.3f; Q=%.1f : %s\n", $region_str,
	248	@{$annot}{qw(coord score b I Q name)};
	249	}
	250	}
	251
	252	sub regions_to_str {
	253	my ($a_data_r) = @_;
	254
	255	my $annot_ref = parse_annots($a_data_r->{annot});
	256
	257	my $region_str = "";
	258	my $annot_str = "";
	259
	260	for my $annot ( @{$annot_ref}) {
	261	if ($annot->{target} =~ m/^q/) {
	262	$region_str = "qRegion";
	263	}
	264	else {
	265	$region_str = " Region";
	266	}
	267
	268	$annot_str .= sprintf "%s: %s : score=%d; bits=%.1f; Id=%.3f; Q=%.1f : %s\n", $region_str,
	269	@{$annot}{qw(coord score b I Q name)};
	270	}
	271	return $annot_str;
	272	}
	273
	274	sub add_best {
	275	my ($line, $a_data) = @_;
	276
	277	my $annot_str = '';
	278
	279	my $annot_refs = parse_annots($a_data->{annot});
	280
	281	# remove old annotation if present
	282	my @line_words = split(/\s/,$line);
	283	if ($line_words[-1] =~ m/~\d/) {
	284	$line = join(' ',@line_words[0 .. $#line_words-1]);
	285	}
	286
	287	for my $annot ( @$annot_refs) {
	288	if ($annot->{target} !~ m/^q/) {
	289	$annot_str .= $annot->{name} . ";"
	290	}
	291	}
	292
	293	if ($annot_str) {
	294	return "$line $annot_str";
	295	}
	296	else {
	297	return $line;
	298	}
	299	}
	300
	301	sub plot_tag_str {
	302
	303	my ($plot_script, $pgm_name, $align_data_r, $regions_str, $doms_str) = @_;
	304
	305	my $svg_pref = q(<object type="image/svg+xml" );
	306	my $svg_post = q( width="660" height="76" ></object>);
	307
	308	#build argument string
	309	my %plt_args = ();
	310	@plt_args{qw(pgm q_cstart l_cstart)} = ($pgm_name, 1, 1);
	311	@plt_args{qw(q_name q_cstop q_astart q_astop l_name l_cstop l_astart l_astop)} =
	312	@{$align_data_r}{qw(q_seqid q_len q_start q_end s_seqid s_len s_start s_end)};
	313	$plt_args{'regions'}= uri_escape(uri_encode($regions_str));
	314	if ($doms_str) {
	315	$plt_args{'doms'} = uri_encode($doms_str);
	316	}
	317
	318	my $dom_info = ();
	319
	320	my @args = map {"$_=$plt_args{$_}"} keys(%plt_args);
	321
	322	return $svg_pref . qq( data="$plot_url?) . join('&',@args) . '"' . $svg_post;
	323	}
	324
	325	sub dom_info_str {
	326	my ($raw_dom_info) = @_;
	327
	328	my $dom_str = "";
	329
	330	unless ($raw_dom_info) { return "";}
	331
	332	my @raw_doms = split('\\|',$raw_dom_info);
	333	shift(@raw_doms);
	334
	335	for my $dom ( @raw_doms ) {
	336	my $tmp_dom = $dom;
	337	$tmp_dom =~ s/^DX:/qDomain:\t/g;
	338	$tmp_dom =~ s/^XD:/lDomain:\t/g;
	339	$tmp_dom =~ s/;C=/\t/g;
	340
	341	$dom_str .= "$tmp_dom\n";
	342	}
	343
	344	return $dom_str;
	345	}
	346
	347	__END__
	348
	349	=pod
	350
	351	=head1 NAME
	352
	353	merge_blast_btab.pl
	354
	355	=head1 SYNOPSIS
	356
	357	merge_blast_btab.pl --btab_file=result.b_tab result.html
	358
	359	=head1 OPTIONS
	360
	361	-h short help
	362	--help include description
	363
	364	--btab_file\|--btab file_name -- blast tabular output file with
	365	sub-alignment scoring
	366
	367	=head1 DESCRIPTION
	368
	369	C<merge_blast_btab.pl> merges the domain annotations and sub-alignment scoring from C<annot_blast_btop2.pl> blast tabular output file with a conventional blast result file.
	370
	371	The tab file is read and parsed, and then the subject/query seqid is used to
	372	capture domain locations in the subject/query sequence. If the domains
	373	overlap the aligned region, the domain names are appended to the output.
	374
	375	=head1 AUTHOR
	376
	377	William R. Pearson, wrp@virginia.edu
	378
	379	=cut

+154

-0

scripts/relabel_domains.py less more

	0	#!/usr/bin/env python
	1
	2	# Given a blast_tabular file with search results from one or more protein queries
	3	#
	4
	5	################################################################
	6	# copyright (c) 2018 by William R. Pearson and The Rector & Visitors
	7	# of the University of Virginia */
	8	# ###############################################################
	9	# Licensed under the Apache License, Version 2.0 (the "License"); you
	10	# may not use this file except in compliance with the License. You
	11	# may obtain a copy of the License at
	12	# http://www.apache.org/licenses/LICENSE-2.0 Unless required by
	13	# applicable law or agreed to in writing, software distributed under
	14	# this License is distributed on an "AS IS" BASIS, WITHOUT WRRANTIES
	15	# OR CONDITIONS OF ANY KIND, either express or implied. See the
	16	# License for the specific language governing permissions and
	17	# limitations under the License.
	18	# ###############################################################
	19
	20
	21	import fileinput
	22	import sys
	23	import re
	24	import argparse
	25	import urllib2
	26
	27	from rename_exons import *
	28
	29	def replace_dom_number(line):
	30
	31	out_str = ''
	32	if (not re.search(r'~',line)):
	33	return line
	34
	35	(info, num, vdom) = re.search(r'^([^~]+)~(\d+)(v?)$',line).groups()
	36	if (vdom is None):
	37	vdom=''
	38
	39	if (num in homolog_dict):
	40	return "%s~h%s%s" % (info, str(homolog_dict[num]['num']), vdom)
	41
	42	else:
	43	name = line.split(" ")[-1].split("{")[0]
	44	if (name == "NODOM"):
	45	return line
	46	else:
	47	if (name in nonhomolog_dict):
	48	return '~'.join(line.split('~')[:-1]) + "~" + str(nonhomolog_dict[name])
	49	return out_str
	50
	51
	52	################
	53	# __main__ function
	54	#
	55
	56	e_thresh = 1e-6
	57	q_thresh = 30.0
	58
	59	homolog_dict = {}
	60	nonhomolog_dict = {}
	61
	62	def main():
	63
	64	# print "#"," ".join(sys.argv)
	65
	66	hom_color = 1
	67	n_hom_color = 11
	68
	69	parser=argparse.ArgumentParser(description='relabel_domains.py result_file.m8CB')
	70
	71	parser.add_argument('--have_qslen', help='bl_tab fields include query/subject lengths',dest='have_qslen',action='store_true',default=False)
	72	parser.add_argument('--dom_info', help='raw domain coordinates included',action='store_true',default=False)
	73	parser.add_argument('files', metavar='FILE', nargs='*', help='files to read, if empty, stdin is used')
	74
	75	args=parser.parse_args()
	76
	77	end_field = -1
	78	data_fields_reset=False
	79
	80	(fields, end_field) = set_data_fields(args, [])
	81
	82	if (args.have_qslen and args.dom_info):
	83	data_fields_reset=True
	84
	85
	86	for line in fileinput.input(args.files):
	87	# pass through comments
	88	if (line[0] == '#'):
	89	print line, # ',' because have not stripped
	90	continue
	91
	92	################
	93	# break up tab fields, check for extra fields
	94	line = line.strip('\n')
	95	line_data = line.split('\t')
	96	if (not data_fields_reset): # look for --have_qslen number, --dom_info data, even if not set
	97	(fields, end_field) = set_data_fields(args, line_data)
	98	data_fields_reset = True
	99
	100	################
	101	# get exon annotations
	102	data = parse_protein(line_data,fields,'') # get score/alignment/domain data
	103
	104	if (len(data['sdom_list'])==0 and len(data['qdom_list'])==0):
	105	print line # no domains to be edited, print stripped line and contine
	106	continue
	107
	108	################
	109	# have domains, check if significant and new, or old and known
	110	# goals are: (1) consistent coloring between query and subject for same domain
	111	# (2) homologous domains get special labels
	112	# need dict of good domain names
	113
	114	################
	115	# check to update doms with good E()-value
	116	if float(data['evalue']) <= e_thresh:
	117	for q_dom in data['qdom_list']:
	118	if (float(q_dom.q_score) >= q_thresh and q_dom.name not in homolog_dict ):
	119	homolog_dict['q_dom.name'] = q_dom_color
	120	dom_color += 1
	121
	122	for s_dom in data['sdom_list']:
	123	if (float(s_dom.q_score) >= q_thresh and s_dom.name not in homolog_dict):
	124	homolog_dict['s_dom.name'] = s_dom.color
	125	hom_color += 1
	126	else:
	127	for s_dom in data['sdom_list']:
	128	if (s_dom.name not in homolog_dict):
	129	nonhomolog_dict['s_dom.name'] = s_dom.color
	130	n_hom_color += 1
	131
	132
	133	################
	134	# done storing good domains, write things out
	135
	136	btab_str = '\t'.join(str(data[x]) for x in fields[:end_field])
	137
	138	for s_dom in data['sdom_list']:
	139	if (s_dom.name in homolog_dict):
	140	s_dom.color=homolog_dict[s_dom.name]
	141	elif (s_dom.name in nonhomolog_dict):
	142	s_dom.color=nonhomolog_dict[s_dom.name]
	143
	144
	145	dom_bar_str = ''
	146	for dom in sorted(data['qdom_list']+data['sdom_list'],key=lambda r: r.idnum):
	147	dom_bar_str += dom.make_bar_str()
	148
	149	print btab_str+dom_bar_str
	150
	151
	152	if __name__ == '__main__':
	153	main()

+797

-0

scripts/rename_exons.py less more

	0	#!/usr/bin/env python
	1	#
	2	# given a -m8CB file with exon annotations for the query and subject,
	3	# adjust the subject exon names to match the query exon names
	4
	5	################################################################
	6	# copyright (c) 2018 by William R. Pearson and The Rector &
	7	# Visitors of the University of Virginia */
	8	################################################################
	9	# Licensed under the Apache License, Version 2.0 (the "License");
	10	# you may not use this file except in compliance with the License.
	11	# You may obtain a copy of the License at
	12	#
	13	# http://www.apache.org/licenses/LICENSE-2.0
	14	#
	15	# Unless required by applicable law or agreed to in writing,
	16	# software distributed under this License is distributed on an "AS
	17	# IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either
	18	# express or implied. See the License for the specific language
	19	# governing permissions and limitations under the License.
	20	################################################################
	21
	22	import fileinput
	23	import sys
	24	import re
	25	import argparse
	26	import copy
	27
	28	################
	29	# "domain" class that describes a domain/exon alignment annotation
	30	#
	31	class DomAlign:
	32	def __init__(self, name, info, color, qstart, qend, sstart, send, raw_score, bit_score, ident, qscore, RXRState, fulltext):
	33	self.name = name
	34	self.info = info
	35	self.color_type = ''
	36	if (not re.search(r'^\d+$',color)):
	37	m=re.search(r'^(\d+)([a-z]?\w*)$',color)
	38	if (m):
	39	(self.color, self.color_type) = m.groups()
	40	self.color = int(self.color)
	41	else:
	42	self.color = int(color)
	43
	44	self.q_start = qstart
	45	self.q_end = qend
	46	self.s_start = sstart
	47	self.s_end = send
	48	self.raw_score = raw_score
	49	self.bit_score = bit_score
	50	self.percid = ident
	51	self.q_score = qscore
	52	self.rxr = RXRState
	53	self.idnum = 0
	54	self.overlap_list = []
	55	self.info_dom = None
	56	self.text = fulltext
	57	self.out_str = ''
	58	self.over_cnt = 0
	59
	60	def append_overlap(self, overlap_dict):
	61	self.overlap_list.append(overlap_dict)
	62
	63	def __str__(self):
	64	# return "[%d]name: %s : %i-%i : %i-%i I=%.1f Q=%.1f %s" % (self.idnum, self.name, self.q_start, self.q_end, self.s_start, self.s_end, self.percid, self.q_score, self.rxr)
	65	return "[%d:%s] %i-%i:%i-%i::%s [over:%d]" % (self.idnum, self.rxr, self.q_start, self.q_end, self.s_start, self.s_end, self.name,len(self.overlap_list))
	66
	67	def print_bar_str(self): # checking for 'NADA'
	68	if (not self.out_str):
	69	self.out_str = self.text
	70	return str("\|%s"%(self.out_str))
	71
	72	def make_bar_str(self): # create original string from values
	73	bar_str = "\|%s:%d-%d:%d-%d:s=%d;b=%.1f;I=%.3f;Q=%.1f;C=%s%s~%d" % (
	74	self.rxr, self.q_start, self.q_end, self.s_start, self.s_end,
	75	self.raw_score, self.bit_score, self.percid, self.q_score, self.name, self.info, self.color)
	76
	77	if (self.color_type):
	78	bar_str += self.color_type
	79	return bar_str
	80
	81	################
	82	# "exonInfo" class describes raw (un-aligned) exons with genome coordinates
	83	#
	84	class exonInfo:
	85	def __init__(self, name, q_target, p_start, p_end, chrom, d_start, d_end, full_text):
	86	self.name = name
	87	self.q_target = q_target
	88	self.p_start = p_start
	89	self.p_end = p_end
	90	self.chrom = chrom
	91	self.d_start = d_start
	92	self.d_end = d_end
	93	self.text = full_text
	94	self.plus_strand = True
	95	if (d_start > d_end):
	96	self.plus_strand = False
	97
	98	def __str__(self):
	99	rxr_str = "XD"
	100	if (self.q_target):
	101	rxr_str="DX"
	102	return '\|%s:%i-%i:%s{%s:%i-%i}' % (rxr_str, self.p_start, self.p_end, self.name, self.chrom, self.d_start, self.d_end)
	103
	104
	105	# Parses domain annotations after split at '\|'
	106	#\|RX:1-38:3-40:s=37;b=17.0;I=0.289;Q=15.9;C=exon_1~1
	107	#\|RX:39-67:41-69:s=78;b=35.8;I=0.483;Q=68.7;C=exon_2~2
	108	#\|XR:1-67:3-69:s=115;b=52.8;I=0.373;Q=116.3;C=exon_1~1
	109	#\|RX:68-117:72-113:s=14;b=6.4;I=0.385;Q=0.0;C=exon_3~3
	110	#\|XR:68-124:70-119:s=-11;b=0.0;I=0.378;Q=0.0;C=exon_2~2
	111	#\|XR:125-167:120-165:s=39;b=17.9;I=0.429;Q=18.5;C=exon_3~3
	112	#\|RX:118-176:114-175:s=24;b=11.0;I=0.411;Q=1.5;C=exon_4~4
	113	#\|RX:177-200:176-198:s=27;b=12.4;I=0.435;Q=4.0;C=exon_5~5
	114	#\|XR:168-200:166-198:s=12;b=5.5;I=0.419;Q=0.0;C=exon_4~4
	115	#
	116	def parse_domain(text):
	117	# takes a domain in string form, turns it into a domain object
	118	# looks like: RX:5-82:5-82:s=397;b=163.1;I=1.000;Q=453.6;C=C.Thioredoxin~1
	119	# could also look like: RX:5-82:5-82:s=397;b=163.1;I=1.000;Q=453.6;C=C.Thioredoxin{PF012445}~1
	120
	121	# get RX/XR and qstart/qstop sstart/sstop as strings
	122	m = re.search(r'^(\w+):(\d+)-(\d+):(\d+)-(\d+):',text)
	123	if (m):
	124	(RXRState, qstart_s, qend_s, sstart_s, send_s) = m.groups()
	125	else:
	126	sys.stderr.write("could not parse location: %s\n"%(text))
	127
	128	# get score, bits, identity, Q info
	129	m = re.search(r's=(\-?\d+);b=(\-?[\d\.]+);I=([\d\.]+);Q=(\-?\d+\.\d*);',text)
	130	if (m):
	131	(r_score_s, b_score_s, ident_s, qscore_s) = m.groups()
	132	else:
	133	sys.stderr.write("Error: no scores: %s\n" %(text))
	134	r_score_s = b_score_s = qscore_s = "-1.0"
	135
	136	# get domain name/color (and possibly {info})
	137
	138	(name, color_s) = re.search(r';C=([^~]+)~(.+)$',text).groups()
	139	info_s=""
	140
	141	if (re.search(r'\}$',name)):
	142	(name, info_s) = re.search(r'([^\{]+)(\{[^\}]+\})$',name).groups()
	143
	144	dom_align = DomAlign(name, info_s, color_s, int(qstart_s), int(qend_s), int(sstart_s), int(send_s),
	145	int(r_score_s), float(b_score_s), float(ident_s),float(qscore_s), RXRState, text)
	146
	147	return dom_align
	148
	149	# dom_info is like domain, but no scores
	150	################
	151	# exon_info is like domain, but no scores
	152	#
	153	def parse_exon_info(text):
	154	# takes a domain in string form, turns it into a domain object
	155	# looks like: DX:1-100;C=C.Thioredoxin~1
	156
	157	(RXRState, start_s, end_s,name, color) = re.search(r'^(\w+):(\d+)-(\d+);C=([^~]+)~(.*)$',text).groups()
	158	info = ""
	159	if (re.search(r'\}$',name)):
	160	(name, info) = re.search(r'([^\{]+)(\{[^\}]+\})$',name).groups()
	161
	162	gene_re = re.search(r'^\{([\w\.]+):(\d+)\-(\d+)\}',info)
	163	if (gene_re):
	164	(chrom, d_start, d_end) = gene_re.groups()
	165	else:
	166	(chrom, d_start, d_end) = ('',-1,-1)
	167	# sys.stderr.write("genome info not found: %s\n" % (text))
	168
	169	q_target = True;
	170	if (RXRState == 'XD'):
	171	q_target = False
	172
	173	exon_info = exonInfo(name, q_target, int(start_s), int(end_s), chrom, int(d_start), int(d_end), text)
	174
	175	return exon_info
	176
	177	def overlap_fract(qdom, sdom):
	178	# checks if a query and subject domain overlap
	179	# if they do, return the amount of overlap with respect to each domain
	180	# how much of query is covered by subject, how much of subject is covered by query
	181
	182	q_overlap = 0.0
	183	s_overlap = 0.0
	184
	185	qq_len = qdom.q_end-qdom.q_start+1 # query alignment length in query coordinates
	186	qs_len = qdom.s_end-qdom.s_start+1 # query alignment length in subj coordinates
	187	sq_len = sdom.q_end-sdom.q_start+1 # subj alignment length in query coordinates
	188	ss_len = sdom.s_end-sdom.s_start+1 # subj alignment length in subject coordinates
	189
	190	case = -1
	191
	192	# case (0) no overlap at all
	193	if (qdom.q_end < sdom.q_start or sdom.s_end < qdom.s_start or qdom.q_start > sdom.q_end or sdom.q_start > qdom.q_end) :
	194	case = 0
	195	q_overlap = s_overlap = 0.0
	196	# case (1) query surrounds subject
	197	elif (qdom.q_start <= sdom.q_start and qdom.q_end >= sdom.q_end):
	198	case = 1
	199	s_overlap = 1.0
	200	q_overlap = float(sq_len)/qq_len
	201	# case (2) subject surrounds query
	202	elif (sdom.s_start <= qdom.s_start and sdom.s_end >= qdom.s_end):
	203	case = 2
	204	q_overlap = 1.0
	205	s_overlap = float(qs_len)/ss_len
	206	# case (3) query left of subject
	207	elif (qdom.q_start <= sdom.q_start and qdom.q_end <= sdom.q_end):
	208	case = 3
	209	q_overlap = float(qdom.q_end-sdom.q_start+1)/qq_len
	210	s_overlap = float(qdom.s_end-sdom.s_start+1)/ss_len
	211	# case (4) subject of left of query
	212	elif (sdom.s_start <= qdom.s_start and sdom.s_end <= qdom.s_end):
	213	case = 4
	214	q_overlap = float(sdom.q_end-qdom.q_start+1)/qq_len
	215	s_overlap = float(sdom.s_end-qdom.s_start+1)/ss_len
	216
	217	if (q_overlap > 1.0 or s_overlap > 1.0):
	218	if (1):
	219	sys.stderr.write("***%i: qdom: %s sdom: %s\n"% (case,str(qdom),str(sdom)))
	220	sys.stderr.write(" ** qover %.3f sover: %.3f\n"% (q_overlap, s_overlap))
	221	sys.stderr.write(" ** qq_len: %d qs_len: %d ss_len: %d sq_len %d\n"%(qq_len, qs_len, ss_len, sq_len))
	222
	223	return (q_overlap, s_overlap)
	224
	225	####
	226	# parse_protein(result_line)
	227	# takes a protein in string format, turns it into a dictionary properly
	228	# looks like: sp\|P30711\|GSTT1_HUMAN up\|Q2NL00\|GSTT1_BOVIN 86.67 240 32 0 1 240 1 240 1.4e-123 444.0 16VI7DR6IT3IR15KQ3AI6TI11TA7YH8RC12TA3SN10FL10QETM2AT6VMTA2LV2DG4ND6PS24EK6TA11DV14FSPQ5IL3LMML1WK5RQ \|XR:4-76:4-76:s=327;b=134.6;I=0.895;Q=367.8;C=C.Thioredoxin~1\|RX:5-82:5-82:s=356;b=146.5;I=0.902;Q=403.3;C=C.Thioredoxin~1\|RX:83-93:83-93:s=52;b=21.4;I=0.818;Q=30.9;C=NODOM~0\|XR:77-93:77-93:s=86;b=35.4;I=0.882;Q=72.6;C=NODOM~0\|RX:94-110:94-110:s=88;b=36.2;I=0.882;Q=75.0;C=vC.GST_C~2v\|XR:94-110:94-110:s=88;b=36.2;I=0.882;Q=75.0;C=vC.GST_C~2v\|RX:111-201:111-201:s=409;b=168.3;I=0.868;Q=468.3;C=C.GST_C~2\|XR:111-201:111-201:s=409;b=168.3;I=0.868;Q=468.3;C=C.GST_C~2\|RX:202-240:202-240:s=154;b=63.4;I=0.795;Q=155.9;C=NODOM~0\|XR:202-240:202-240:s=154;b=63.4;I=0.795;Q=155.9;C=NODOM~0
	229	#
	230	# returns [data[x] for x in fields] but also data['q/s_dom_list'] and data['q/sinfo_list']
	231	def parse_protein(line_data,fields, req_name):
	232	# last part (domain annotions) split('\|') and parsed by parse_domain()
	233
	234	data = {}
	235	data = dict(zip(fields, line_data))
	236	if (re.search(r'\\|',data['qseqid'])):
	237	data['qseq_acc'] = data['qseqid'].split('\|')[1]
	238	else:
	239	data['qseq_acc'] = data['qseqid']
	240
	241	if (re.search(r'\\|',data['sseqid'])):
	242	data['sseq_acc'] = data['sseqid'].split('\|')[1]
	243	else:
	244	data['sseq_acc'] = data['sseqid']
	245
	246	Qdom_list = []
	247	Sdom_list = []
	248
	249	Qinfo_list = []
	250	Sinfo_list = []
	251
	252	counter = 0
	253
	254	if ('dom_annot' in data and len(data['dom_annot']) > 0):
	255	for dom_str in data['dom_annot'].split('\|')[1:]:
	256	if (req_name and not re.search(req_name, dom_str)):
	257	continue
	258
	259	counter += 1
	260	dom = parse_domain(dom_str)
	261	dom.idnum = counter
	262	if (dom.rxr == 'RX'):
	263	Qdom_list.append(dom)
	264	else:
	265	Sdom_list.append(dom)
	266
	267	data['qdom_list'] = Qdom_list
	268	data['sdom_list'] = Sdom_list
	269
	270	if ('dom_info' in data and len(data['dom_info']) > 0):
	271	for info_str in data['dom_info'].split('\|')[1:]:
	272	if (req_name and not re.search(req_name, info_str)):
	273	continue
	274	if (not re.search(r'^[DX][XD]',info_str)):
	275	continue
	276
	277	dinfo = parse_exon_info(info_str)
	278
	279	if (dinfo.q_target):
	280	Qinfo_list.append(dinfo)
	281	else:
	282	Sinfo_list.append(dinfo)
	283
	284
	285	# put links to info_list into dom_list so info_list names can
	286	# be changed -- give S/Qinfo's the S/Qdom ids of the overlapping domain
	287
	288	find_info_overlaps(Qinfo_list, Qdom_list)
	289	find_info_overlaps(Sinfo_list, Sdom_list)
	290
	291	data['qinfo_list'] = Qinfo_list
	292	data['sinfo_list'] = Sinfo_list
	293
	294	return data
	295
	296	# "domain" : RX:1-38:3-40:s=37;b=17.0;I=0.289;Q=15.9;C=exon_1~1
	297	# "name" : like exon_2
	298	# expanded for domain: RX:1-38:3-40:s=37;b=17.0;I=0.289;Q=15.9;C=exon_1{chr1:12345678-123456987}~1
	299	def replace_name(domain_text, new_name, new_color_s):
	300	out = "=".join(domain_text.split("=")[:-1]) # out has everything to last '='
	301
	302	old_name = domain_text.split(";C=")[-1]
	303	old_info=""
	304
	305	if (re.search(r'\}~',old_name)):
	306	(old_info)=re.search(r'(\{[^\}]+\})~',old_name).group(1)
	307
	308	if (not re.match(r'\d+',new_color_s)):
	309	new_color_s="0"
	310	out += "="+new_name+old_info+"~"+new_color_s # put it together
	311	return out
	312
	313	################
	314	# check for overlaps using mid-point
	315	#
	316	def mid_overlaps(qdom_list, sdom_list):
	317
	318	if (len(qdom_list) != len(sdom_list)):
	319	return False
	320
	321	for ix, q_dom in enumerate(qdom_list):
	322	s_dom = sdom_list[ix]
	323	q_mid = q_dom.q_start + (q_dom.q_end - q_dom.q_start + 1)/2.0
	324	if not (q_mid >= s_dom.q_start and q_mid <= s_dom.q_end):
	325	return False
	326
	327	q_qfract, q_sfract = overlap_fract(q_dom, s_dom) # overlap from query perspective
	328	s_sfract, s_qfract = overlap_fract(s_dom, q_dom) # overlap from subject perspective
	329
	330	q_dom.overlap_list.append({"dom": s_dom, "q_over": q_qfract, "s_over": q_sfract})
	331	s_dom.overlap_list.append({"dom": q_dom, "q_over": s_qfract, "s_over": s_sfract})
	332
	333	return True
	334
	335	################
	336	# find_overlaps -- populates dom.overlap_list for qdoms, sdoms
	337	#
	338	def find_overlaps(qdom_list, sdom_list, over_thresh):
	339	# find qdom, sdom overlaps in O(N) time
	340	#
	341
	342	if (len(sdom_list) == 0 or len(qdom_list)==0):
	343	return
	344
	345	if (len(sdom_list) == len(qdom_list)): # same number of domains
	346	if (mid_overlaps(qdom_list, sdom_list)):
	347	return;
	348	else:
	349	for d in qdom_list:
	350	d.overlap_list = []
	351	for d in sdom_list:
	352	d.overlap_list = []
	353
	354
	355	qdom_queue = [x for x in qdom_list] # build a duplicate list
	356	sdom_queue = [x for x in sdom_list]
	357
	358	qdom = qdom_queue.pop(0) # get the first element of each
	359	sdom = sdom_queue.pop(0)
	360
	361	while (True):
	362	pop_s = pop_q = False
	363
	364	q_qfract, q_sfract = overlap_fract(qdom, sdom) # overlap from query perspective
	365	if (q_qfract > over_thresh or q_sfract > over_thresh):
	366	qdom.append_overlap({"dom": sdom, "q_over": q_qfract, "s_over": q_sfract})
	367	qdom.over_cnt += 1
	368
	369	s_sfract, s_qfract = overlap_fract(sdom, qdom) # overlap from query perspective
	370	if (s_qfract > over_thresh or s_sfract > over_thresh):
	371	sdom.append_overlap({"dom": qdom, "q_over": s_qfract, "s_over": s_sfract})
	372	sdom.over_cnt += 1
	373
	374	# check to see if we've used up the domain
	375	if (qdom.s_end >= sdom.s_end):
	376	pop_s = True
	377	# else there are more s_dom's that are part of this q_dom
	378
	379	if (sdom.q_end >= qdom.q_end):
	380	pop_q = True
	381	# else there are more q_dom's that are part of this s_dom
	382
	383	# print 'QS: %s %s\t%s %s' %(pop_q, pop_s, qdom, sdom)
	384
	385	if (len(qdom_queue) > 0):
	386	if (pop_q): # done with this qdom, get next
	387	qdom = qdom_queue.pop(0)
	388	elif (pop_q): # don't break until we try to get the next domain
	389	break;
	390
	391	if (len(sdom_queue) > 0):
	392	if (pop_s): # done with this sdom, get next
	393	sdom = sdom_queue.pop(0)
	394	elif (pop_s): # don't break until we try to get the next domain
	395	break;
	396	####
	397	# all done with overlaps
	398
	399	# # print "overlaps done"
	400	# for qd in qdom_list:
	401	# print qd, qd.over_cnt
	402	# for sd in qd.overlap_list:
	403	# print " s: q_over %.3f s_over: %.3f %s" % (sd['q_over'], sd['s_over'], str(sd['dom']))
	404	# print "===="
	405
	406	# for sd in sdom_list:
	407	# print sd, sd.over_cnt
	408	# for qd in sd.overlap_list:
	409	# print " q: q_over %.3f s_over: %.3f %s" % (qd['q_over'], qd['s_over'], str(qd['dom']))
	410	# print "===="
	411
	412	################
	413	# info_overlaps -- populates dom.overlap_list for qdoms, sdoms
	414	#
	415	def find_info_overlaps(info_list, dom_list):
	416
	417	if (len(info_list) == 0 or len(dom_list)==0):
	418	return
	419
	420	info_queue = [x for x in info_list] # build a duplicate list
	421	dom_queue = [x for x in dom_list]
	422
	423	info = info_queue.pop(0) # get the first element of each
	424	dom = dom_queue.pop(0)
	425
	426	while (True):
	427	pop_d = pop_i = False
	428
	429	if (dom.rxr == 'RX'): # use dom.q_start/q_end
	430	if (dom.q_end < info.p_start):
	431	pop_d = True
	432	elif (dom.q_end >= info.p_start and dom.q_start <= info.p_end): # overlap
	433	dom.info_dom = info
	434	pop_d = True
	435	pop_i = True
	436	elif (info.p_end < dom.q_start):
	437	pop_i = True
	438
	439	else: # use dom.s_start/s_end
	440	if (dom.s_end < info.p_start):
	441	pop_d = True
	442	elif (dom.s_end >= info.p_start and dom.s_start <= info.p_end): # overlap
	443	dom.info_dom = info
	444	pop_d = True
	445	pop_i = True
	446	elif (info.p_end < dom.s_start):
	447	pop_i = True
	448
	449	if (len(info_queue) > 0):
	450	if (pop_i): # done with this info, get next
	451	info = info_queue.pop(0)
	452	elif (pop_i): # don't break until we try to get the next domain
	453	break;
	454
	455	if (len(dom_queue) > 0):
	456	if (pop_d): # done with this dom, get next
	457	dom = dom_queue.pop(0)
	458	elif (pop_d):
	459	break;
	460
	461	################
	462	# build_multi_dict -- builds of dictionaries of multiple overlaps in
	463	# qdom.overlap_list or sdom.overlap_list
	464	# returns multi_dict[idnum]
	465	#
	466	def build_multi_dict(dom_list):
	467	# this code looks for xdom's that are associated with multiple ydoms
	468	#
	469	multi_dict = {} # dict of {qids:/sdom:/qdoms:[]}
	470	for dom in dom_list: # for each subject domain
	471	if (dom.over_cnt <= 1):
	472	continue
	473
	474	multi_id_list = []
	475	multi_dom_list = []
	476	multi_q_cnt = 0
	477	for xd_over_yd in dom.overlap_list: # a set of q_doms that overlap the subject
	478	multi_q_cnt += 1
	479	multi_id_list.append(xd_over_yd["dom"].idnum) # these are q_dom idnum's
	480	multi_dom_list.append(xd_over_yd["dom"]) # these are q_doms
	481
	482	if (multi_q_cnt > 1): # only save when two (or more) overlaps
	483	multi_dict[dom.idnum] = {"yids": multi_id_list, "ydoms":multi_dom_list, 'xdom':dom}
	484
	485	# # print out current multi_q_list
	486	# print "--- multi_q dict ---"
	487	# for db in multi_dict.keys():
	488	# print "sdom: %s"%(db)
	489	# for ix, qd in enumerate(multi_dict[db]['ydoms']):
	490	# print " %d %d: %s"%(ix, multi_dict[db]['yids'][ix], qd)
	491
	492	# print "--- multi_dict done"
	493
	494	return multi_dict
	495
	496	################
	497	# find_best_id() -- returns id of domain with longest 'q_over'
	498	#
	499	def find_best_id(overlap_list, over_type):
	500
	501	max_fract = 0.0
	502	max_idnum = 0
	503	for over_d in overlap_list:
	504	if (over_d[over_type] > max_fract):
	505	max_idnum = over_d['dom'].idnum
	506	max_fract = over_d[over_type]
	507
	508	return max_idnum
	509
	510	################################################################
	511	# final labeling routine -- leave qdom's alone, modify sdoms based on qdoms.
	512	################
	513	# sdom's in more than one qdom are in multi_q_dict[]
	514	# qdom's in more than one sdom are in multi_s_dict[]
	515	# everyone else just gets the qdom name
	516	# returns sdom_displayed_dict{idnum} -- the set of sdoms that have been modified
	517	#
	518	# 13-Nov-2018 -- ensure that there is an info_dom before replacing info_dom.text
	519	#
	520	def label_doms(qdom_list, sdom_list, multi_q_dict, multi_s_dict):
	521
	522	sdom_displayed_dict = {}
	523	for qdom in qdom_list:
	524	# qdom's stay the same
	525	qdom.out_str = qdom.text
	526
	527	# check for s_doms with multiple q_doms
	528	if (qdom.idnum in multi_s_dict):
	529	# find the best, name it exon_X, find the rest, name it qdom.name
	530	multi_s_entry = multi_s_dict[qdom.idnum]
	531	best_id = find_best_id(qdom.overlap_list,'q_over') # find sdom with most overlap
	532	for s_over in qdom.overlap_list: # find the sdom's that overlap this qdom
	533	sdom = s_over['dom']
	534	if (sdom.idnum == best_id):
	535	sdom.out_str = replace_name(sdom.text, qdom.name, str(qdom.color))
	536	if (sdom.info_dom):
	537	sdom.info_dom.out_str = replace_name(sdom.info_dom.text,qdom.name, str(qdom.color))
	538	else:
	539	sdom.out_str = replace_name(sdom.text, "exon_X","0")
	540	if (sdom.info_dom):
	541	sdom.info_dom.out_str = replace_name(sdom.info_dom.text,"exon_X","0")
	542	sdom_displayed_dict[sdom.idnum] = sdom;
	543	continue # prevents re-labeling later
	544
	545	# check for q_doms with multiple doms
	546	for sd_over in qdom.overlap_list:
	547	sdom = sd_over['dom']
	548	# it might make sense to do this in a second for loop after
	549	# all the multiple stuff is done
	550	if (sdom.idnum not in multi_q_dict):
	551	# this is the simplest case -- sdom.text gets qdom.name
	552	if (sdom.idnum not in sdom_displayed_dict):
	553	sdom.out_str = replace_name(sdom.text, qdom.name, str(qdom.color))
	554	if (sdom.info_dom):
	555	sdom.info_dom.out_str = replace_name(sdom.info_dom.text,qdom.name, str(qdom.color))
	556	else:
	557	# this sdom belongs to multiple q_doms, add each of those q_doms to the name
	558	exon_str='exon_'
	559	# "ydoms" here are the qdoms overlapped by sdom
	560	exon_str += '/'.join([ x.name.split("_")[1] for x in multi_q_dict[sdom.idnum]['ydoms']])
	561	sdom.out_str = replace_name(sdom.text, exon_str,"0")
	562	if (sdom.info_dom):
	563	sdom.info_dom.out_str = replace_name(sdom.info_dom.text,exon_str,"0")
	564
	565	sdom_displayed_dict[sdom.idnum]=sdom
	566
	567	# done with labeling sdoms based on qdoms, but some may be unlabeled
	568	# check for missing s_doms
	569	while (len(sdom_displayed_dict.keys()) < len(sdom_list)):
	570	for sdom in sdom_list:
	571	if (sdom.idnum not in sdom_displayed_dict):
	572	sdom.out_str = replace_name(sdom.text, "exon_X","0")
	573	if (sdom.info_dom):
	574	sdom.info_dom.out_str = replace_name(sdom.info_dom.text,"exon_X","0")
	575
	576	sdom_displayed_dict[sdom.idnum] = sdom
	577
	578	return sdom_displayed_dict
	579
	580	################
	581	#
	582	# aa_to_exon() --- given a coordinate and the corresponding exon map, return the exon coordinate
	583	# (can only be done for aligned exons)
	584	#
	585	# this version of the function must use an info_list, not an
	586	# align_list, because it uses p_start/p_end rather than q_start/s_start, etc.
	587	# a version using qp_start/sp_start would also need a target argument
	588	#
	589	def aa_to_exon(aa_coords, exon_info_list):
	590
	591	sorted_aa_coords = sorted(aa_coords)
	592
	593	pos_strand = True
	594	if (exon_info_list[0].d_start > exon_info_list[0].d_end):
	595	pos_strand = False
	596
	597	ex_x = 0
	598	exon_coords = []
	599
	600	aap_x = 0
	601	this_aap = sorted_aa_coords[aap_x]
	602	while (ex_x < len(exon_info_list)):
	603	this_exon = exon_info_list[ex_x]
	604	if (this_aap <= this_exon.p_end and this_aap >= this_exon.p_start):
	605	aa_dna_offset = (this_aap - this_exon.p_start) * 3
	606
	607	if (pos_strand):
	608	aa_dna_pos = this_exon.d_start + aa_dna_offset
	609	else:
	610	aa_dna_pos = this_exon.d_start - aa_dna_offset
	611
	612	exon_coords.append({'chrom':this_exon.chrom, 'dpos':aa_dna_pos})
	613	aap_x += 1
	614	if (aap_x < len(sorted_aa_coords)):
	615	this_aap = sorted_aa_coords[aap_x]
	616	else:
	617	break
	618	else:
	619	ex_x += 1
	620
	621	aa_coord_dict = {}
	622	for aap_x, aap in enumerate(sorted_aa_coords):
	623	aa_coord_dict[aap] = exon_coords[aap_x]
	624
	625	return [aa_coord_dict[ax] for ax in aa_coords]
	626
	627	################
	628	#
	629	def set_data_fields(args, line_data) :
	630
	631	field_str = 'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore BTOP dom_annot'
	632	field_qs_str = 'qseqid qlen sseqid slen pident length mismatch gapopen qstart qend sstart send evalue bitscore BTOP dom_annot'
	633
	634	if (len(line_data) > 1) :
	635	if ((not args.have_qslen) and re.search(r'\d+',line_data[1])):
	636	args.have_qslen=True
	637
	638	if ((not args.dom_info) and re.search(r'^\\|[DX][XD]\:',line_data[-1])):
	639	args.dom_info = True
	640
	641	end_field = -1
	642	fields = field_str.split(' ')
	643
	644	if (args.have_qslen):
	645	fields = field_qs_str.split(' ')
	646
	647	if (args.dom_info):
	648	fields.append('dom_info')
	649	end_field = -2
	650
	651	return (fields, end_field)
	652
	653	################################################################
	654	#
	655	# main program
	656	# print "#"," ".join(sys.argv)
	657
	658	def main():
	659
	660	parser=argparse.ArgumentParser(description='scan_exons.py result_file.m8CB : re-label subject exons to match query')
	661	parser.add_argument('--have_qslen', help='bl_tab fields include query/subject lengths',dest='have_qslen',action='store_true',default=False)
	662	parser.add_argument('--dom_info', help='raw domain coordinates included',action='store_true',default=False)
	663	parser.add_argument('--fill_gcoords', help='fill in genomic coordinates',action='store_true',default=False)
	664	parser.add_argument('files', metavar='FILE', nargs='*', help='files to read, if empty, stdin is used')
	665
	666	args=parser.parse_args()
	667
	668	end_field = -1
	669	data_fields_reset=False
	670
	671	(fields, end_field) = set_data_fields(args, [])
	672
	673	if (args.have_qslen and args.dom_info):
	674	data_fields_reset=True
	675
	676	saved_qdom_list = []
	677	qdom_list = []
	678
	679	for line in fileinput.input(args.files):
	680	# pass through comments
	681	if (line[0] == '#'):
	682	print line, # ',' because have not stripped
	683	continue
	684
	685	################
	686	# break up tab fields, check for extra fields
	687	line = line.strip('\n')
	688	line_data = line.split('\t')
	689	if (not data_fields_reset): # look for --have_qslen number, --dom_info data, even if not set
	690	(fields, end_field) = set_data_fields(args, line_data)
	691	data_fields_reset = True
	692
	693	################
	694	# get exon annotations
	695	data = parse_protein(line_data,fields,"exon") # get score/alignment/domain data
	696
	697	if (len(data['sdom_list'])==0 and len(data['qdom_list'])==0):
	698	print line # no domains to be edited, print stripped line and contine
	699	continue
	700
	701	# qdom_list=[] outside of loop for cases where the qseqid==sseqid match is not first
	702	if len(data['qdom_list'])== 0:
	703	if data['qseqid'] == data['sseqid']:
	704	saved_qdom_list = [ copy.deepcopy(x) for x in data['sdom_list']]
	705	max_sdom_id=len(data['sdom_list'])+1
	706	for qdom in saved_qdom_list:
	707	qdom.rxr = 'RX'
	708	qdom.idnum = max_sdom_id
	709	max_sdom_id += 1
	710
	711	qdom_list = [copy.deepcopy(x) for x in saved_qdom_list]
	712	else:
	713	qdom_list = data['qdom_list']
	714
	715	# print out non-exon info
	716
	717	if (len(qdom_list) == 0):
	718	print line
	719	continue
	720
	721	btab_str = '\t'.join(str(data[x]) for x in fields[:end_field])
	722	# print # comment out for single line
	723
	724	################
	725	# find overlaps and multi-overlaps
	726	#
	727	find_overlaps(qdom_list,data['sdom_list'], 0.2)
	728
	729	multi_q_dict = build_multi_dict(data['sdom_list']) # keys are sdoms hitting multiple qdoms
	730	multi_s_dict = build_multi_dict(qdom_list) # keys are qdoms hitting mulitple sdoms
	731
	732	################
	733	# label qdoms, relabel sdoms
	734	#
	735	sdom_displayed_dict = label_doms(qdom_list, data['sdom_list'], multi_q_dict, multi_s_dict)
	736
	737	################
	738	# print exon annotations
	739	#
	740	q_exon_list = data['qdom_list']
	741
	742	s_exon_list = [sdom_displayed_dict[x] for x in sdom_displayed_dict.keys()]
	743
	744	################
	745	# if args.fill_gcoords, then do the transformations on the current exon lists
	746
	747	if (args.fill_gcoords):
	748	sa_from_qa = []
	749	for q_ex in q_exon_list:
	750	sa_from_qa.append(q_ex.q_start)
	751	sa_from_qa.append(q_ex.q_end)
	752
	753	# have list of coordinates, map them to exon
	754	sex_from_qa2sa = aa_to_exon(sa_from_qa,data['sinfo_list'])
	755
	756	for iqx, q_ex in enumerate(q_exon_list):
	757	sg_start = sex_from_qa2sa[2*iqx]
	758	sg_end = sex_from_qa2sa[2*iqx+1]
	759	sg_replace="::%s:%d-%d}"%(sg_start['chrom'],sg_start['dpos'],sg_end['dpos'])
	760	q_ex.text=re.sub(r'\}',sg_replace,q_ex.text)
	761	q_ex.out_str=re.sub(r'\}',sg_replace,q_ex.out_str)
	762
	763	qa_from_sa = []
	764	for s_ex in s_exon_list:
	765	qa_from_sa.append(s_ex.q_start)
	766	qa_from_sa.append(s_ex.q_end)
	767
	768	# have list of coordinates, map them to exon
	769	qex_from_sa2qa = aa_to_exon(qa_from_sa,data['qinfo_list'])
	770
	771	for isx, s_ex in enumerate(s_exon_list):
	772	qg_start = sex_from_qa2sa[2*iqx]
	773	qg_end = sex_from_qa2sa[2*iqx+1]
	774	qg_replace="{%s:%d-%d::"%(sg_start['chrom'],sg_start['dpos'],sg_end['dpos'])
	775	s_ex.text=re.sub(r'\{',qg_replace,s_ex.text)
	776	s_ex.out_str = re.sub(r'\{',qg_replace,s_ex.out_str)
	777
	778	sorted_exon_list = sorted(q_exon_list+s_exon_list,key = lambda r: r.idnum)
	779
	780	dom_bar_str = ''
	781	for exon in sorted_exon_list:
	782	# print exon.print_bar_str() # for multi-line output
	783	dom_bar_str += exon.print_bar_str()
	784
	785	info_bar_str = ''
	786	for info in data['qinfo_list'] + data['sinfo_list']:
	787	info_bar_str += info.text
	788
	789	print '\t'.join((btab_str, dom_bar_str, info_bar_str))
	790
	791	################
	792	# run the program ...
	793
	794	if __name__ == '__main__':
	795	main()
	796

+2

-3

scripts/summ_domain_ident.pl less more

0		#!/usr/bin/perl -w
	0	#!/usr/bin/env perl
1	1
2	2	################################################################
3	3	# copyright (c) 2014 by William R. Pearson and The Rector &

21	21	# parse:
22	22	# sp\|P09488\|GSTM1_HUMAN gi\|121735\|sp\|P09488.3\|GSTM1_HUMAN 100.00 218 0 0 1 218 1 218 2.9e-113 408.2 218M \|RX:1-12:1-12:s=64;b=25.0;I=1.000;Q=47.5;C=exon_1\|RX:13-37:13-37:s=128;b=49.9;I=1.000;Q=121.4;C=exon_2\|RX:38-59:38-59:s=125;b=48.7;I=1.000;Q=117.9;C=exon_3\|RX:60-86:60-86:s=145;b=56.5;I=1.000;Q=141.0;C=exon_4\|RX:87-120:87-120:s=185;b=72.1;I=1.000;Q=187.2;C=exon_5\|RX:121-152:121-152:s=174;b=67.8;I=1.000;Q=174.5;C=exon_6\|RX:153-189:153-189:s=197;b=76.8;I=1.000;Q=201.0;C=exon_7\|RX:190-218:190-218:s=151;b=58.9;I=1.000;Q=147.9;C=exon_8
23	23
24
25
	24	use warnings;
26	25	use strict;
27	26	use Getopt::Long;
28	27	use Pod::Usage;

~~seq/gstt1_pssm.asn1~~ less more

Binary diff not shown

+16

-13

src/altlib.h less more

38	38	#define VMSPIR 5
39	39	#define GCGBIN 6
40	40	#define FASTQ 7
41		#define LASTTXT 7
	41	#define ACC_SCRIPT 9
	42	#define LASTTXT 9
42	43	#define ACC_LIST 10
43	44
44	45	#include "mm_file.h"

94	95	#endif
95	96
96	97	int (getliba[LASTLIB])(unsigned char , int, char , int, fseek_t , int *,
97		struct lmf_str , long )={
98		agetlib,lgetlib,pgetlib,egetlib,
99		igetlib,vgetlib,gcg_getlib,qgetlib,
100		agetlib,agetlib
	98	struct lmf_str , long )={
	99	agetlib,lgetlib,pgetlib,egetlib, /* 0 - 3 */
	100	igetlib,vgetlib,gcg_getlib,qgetlib, /* 4- 7 */
	101	agetlib,agetlib /* 8,9 */
101	102	#ifdef UNIX
102		,agetlib
	103	,agetlib /* 10 */
103	104	#ifdef NCBIBL13
104		,ncbl_getliba
	105	,ncbl_getliba /* 11 */
105	106	#else
106		,ncbl2_getliba
	107	,ncbl2_getliba /* 12 */
107	108	#endif
108	109	#ifdef NCBIBL20
109		,ncbl2_getliba
	110	,ncbl2_getliba /* 12 */
	111	#else
	112	,agetlib /* 12 - place holder */
110	113	#endif
111	114	#ifdef MYSQL_DB
112		,agetlib
113		,agetlib
114		,agetlib
115		,mysql_getlib
	115	,agetlib /* 13 */
	116	,agetlib /* 14 */
	117	,agetlib /* 15 */
	118	,mysql_getlib /* 16 */
116	119	#endif
117	120	#endif
118	121	};

+1

-1

src/cal_cons2.c less more

0		/* cal_cons.c - routines for printing translated alignments for
	0	/* cal_cons.c - routines for printing alignments for
1	1	fasta, ssearch, ggsearch, glsearch */
2	2
3	3	/* $Id: cal_cons.c 1280 2014-08-21 00:47:55Z wrp $ */

+20

-12

src/comp_lib9.c less more

638	638
639	639	/* Open query library */
640	640	if ((q_file_p= open_lib(q_lib_p, m_msg.qdnaseq,qascii,!m_msg.quiet))==NULL) {
641		s_abort(" cannot open library ",m_msg.tname);
	641	fprintf(stderr,"*** error [%s:%d] cannot open library %s\n",__FILE__,__LINE__, m_msg.tname);
	642	exit(1);
	643
	644	/* s_abort(" cannot open library ",m_msg.tname); */
642	645	}
643	646	/* Fetch first sequence */
644	647	qlib = 0;

663	666
664	667	/* if protein and ldb_info.term_code set, add '' if not there /
665	668	if (m_msg.ldb_info.term_code && !(m_msg.qdnaseq==SEQT_DNA \|\| m_msg.qdnaseq==SEQT_RNA) &&
666		aa0[0][m_msg.n0-1]!='*') {
667		aa0[0][m_msg.n0++]='*';
	669	aa0[0][m_msg.n0-1]!=aascii['*']) {
	670	aa0[0][m_msg.n0++]=aascii['*'];
668	671	aa0[0][m_msg.n0]=0;
669	672	}
670	673

762	765	}
763	766
764	767	/* get a list of files to search */
765		lib_list_p = lib_select(lib_db_file, m_msg.ltitle, m_msg.flstr,
766		m_msg.ldb_info.ldnaseq);
	768	lib_list_p = lib_select(lib_db_file, m_msg.ltitle, m_msg.flstr, m_msg.ldb_info.ldnaseq);
767	769	}
768	770	else {
769	771	/* get a list of files to search */

914	916
915	917	if (!validate_params(aa0[0],m_msg.n0, &m_msg, &pst,
916	918	lascii, pascii)) {
917		fprintf(stderr," * ERROR * validate_params() failed:\n -- %s\n", argv_line);
	919	fprintf(stderr," *** error [%s:%d] - validate_params() failed:\n -- %s\n", __FILE__, __LINE__, argv_line);
918	920	exit(1);
919	921	}
920	922

1520	1522	if (pst.do_rep) {
1521	1523	if (pst.zsflag >= 0) {
1522	1524	for (i=m_msg.nskip; i < m_msg.nskip + m_msg.nshow; i++) {
1523		bestp_arr[i]->repeat_thresh =
1524		min(E1_to_s(pst.e_cut_r, m_msg.n0, bestp_arr[i]->seq->n1,
1525		pst.zdb_size, m_msg.pstat_void),bestp_arr[i]->rst.score[pst.score_ix]);
	1525	if (bestp_arr[i]->rst.escore > pst.e_cut_r) {
	1526	bestp_arr[i]->repeat_thresh = bestp_arr[i]->rst.score[pst.score_ix] * 10;
	1527	}
	1528	else {
	1529	bestp_arr[i]->repeat_thresh =
	1530	min(E1_to_s(pst.e_cut_r, m_msg.n0, bestp_arr[i]->seq->n1, pst.zdb_size, m_msg.pstat_void),
	1531	bestp_arr[i]->rst.score[pst.score_ix]);
	1532	}
1526	1533	}
1527	1534	}
1528	1535	else {

2242	2249	getlib() calls */
2243	2250	/* **************************************************************** */
2244	2251	struct getlib_str *
2245		init_getlib_info(struct lib_struct *lib_list_p, int maxn,long max_memK) {
	2252	init_getlib_info(struct lib_struct *lib_list_p, int maxn, long max_memK) {
2246	2253	struct getlib_str *my_getlib_info;
2247	2254	unsigned char *aa1save;
2248	2255

2353	2360	if ((cur_lib_p->m_file_p =
2354	2361	open_lib(cur_lib_p, m_msp->ldb_info.ldnaseq, lascii, !m_msp->quiet))
2355	2362	==NULL) {
2356		fprintf(stderr," cannot open library %s\n",cur_lib_p->file_name);
	2363	fprintf(stderr,"(*** warning [%s:%d] cannot open library %s\n",__FILE__,__LINE__,cur_lib_p->file_name);
2357	2364	getlib_info->lib_list_p = getlib_info->lib_list_p->next;
2358	2365	if (getlib_info->lib_list_p == NULL) {
2359	2366	goto return_null;

2374	2381	/* if the library is NCBIBL20 and memory mapped, simply return
2375	2382	pointers to the memory map */
2376	2383	m_fd = getlib_info->lib_list_p->m_file_p;
2377		if (m_fd->get_mmap_chain) {
	2384
	2385	if (m_fd->get_mmap_chain && getlib_info->use_memory>=0) {
2378	2386	/* get a new seqr_chain */
2379	2387	my_seqr_chain =
2380	2388	new_seqr_chain(m_bufi_p->max_chain_seqs,(m_bufi_p->seq_buf_size+1),

+50

-9

src/compacc2e.c less more

222	222
223	223	/* subs_env takes a string, possibly with ${ENV}, and looks up all the
224	224	potential environment variables and substitutes them into the
225		string */
226
	225	string
	226	*/
227	227	void subs_env(char dest, char src, int dest_size) {
228	228	char last_src, bp, *bp1;
229	229

273	273	dest[dest_size-1]='\0';
274	274	}
275	275	}
276
277	276
278	277	void
279	278	selectbest(struct beststr *bptr, int k, int n) / k is rank in array */

1403	1402	char *link_lib_str;
1404	1403	char link_script[MAX_LSTR];
1405	1404	int link_lib_type;
1406		char bp, link_bp;
	1405	char bp, link_bp, *bp_s;
1407	1406	FILE link_fd=NULL; / file for link accessions */
1408	1407
1409	1408	#ifndef UNIX

1466	1465	}
1467	1466
1468	1467	strncpy(link_script,link_bp,sizeof(link_script));
	1468	/* un-edit m_msp->link_lname */
	1469	if (bp != NULL) *bp = ' ';
	1470
1469	1471	link_script[sizeof(link_script)-1] = '\0';
	1472
	1473	/* convert + to space in script string */
	1474	for (bp_s = strchr(link_script+1,'+'); bp_s; bp_s=strchr(bp_s+1,'+')) {
	1475	*bp_s = ' ';
	1476	}
	1477
1470	1478	SAFE_STRNCAT(link_script," ",sizeof(link_script));
1471	1479	SAFE_STRNCAT(link_script,link_acc_file,sizeof(link_script));
1472	1480	SAFE_STRNCAT(link_script," >",sizeof(link_script));
1473	1481	SAFE_STRNCAT(link_script,link_lib_file,sizeof(link_script));
1474
1475		/* un-edit m_msp->link_lname */
1476		if (bp != NULL) *bp = ' ';
1477	1482
1478	1483	/* run link_script link_acc_file > link_lib_file */
1479	1484	status = system(link_script);

1580	1585	}
1581	1586
1582	1587	strncpy(lib_db_script,lib_bp,sizeof(lib_db_script));
	1588	bp = strchr(lib_db_script,'+');
	1589	for ( ; bp; bp=strchr(bp+1,'+')) {
	1590	*bp=' ';
	1591	}
	1592
1583	1593	lib_db_script[sizeof(lib_db_script)-1] = '\0';
1584	1594	SAFE_STRNCAT(lib_db_script," >",sizeof(lib_db_script));
1585	1595	SAFE_STRNCAT(lib_db_script,lib_db_file,sizeof(lib_db_script));

1649	1659
1650	1660	this->max_annot += (this->max_annot/2);
1651	1661	if ((this->tmp_arr_p= (struct annot_entry )realloc(this->tmp_arr_p, this->max_annotsizeof(struct annot_entry)))==NULL) {
1652		fprintf(stderr,"[*** error [%s:%d] - cannot reallocate tmp_ann_astr[%d]\n",
	1662	fprintf(stderr,"*** error [%s:%d] - cannot reallocate tmp_ann_astr[%d]\n",
1653	1663	__FILE__, __LINE__, this->max_annot);
1654	1664	return 0;
1655	1665	}

1702	1712	annotations back
1703	1713	*/
1704	1714
	1715	/* create filename for input accessions */
1705	1716	annot_bline_file[0] = '\0';
1706	1717
1707	1718	if ((annot_descr_file=(char *)calloc(MAX_STR,sizeof(char)))==NULL) {

1710	1721	}
1711	1722	annot_descr_file[0] = '\0';
1712	1723
	1724	/* add temporary directory if $TMP_DIR */
1713	1725	if ((bp=getenv("TMP_DIR"))!=NULL) {
1714	1726	strncpy(annot_bline_file,bp,sizeof(annot_bline_file));
1715	1727	annot_bline_file[sizeof(annot_bline_file)-1] = '\0';

1728	1740	goto no_annots;
1729	1741	}
1730	1742
	1743	/* write out accessions, sequence length */
1731	1744	for (i=0; i<nbest; i++) {
1732	1745	if (bestp_arr[i]->mseq->annot_req_flag) { continue; }
1733	1746	if ((strlen(bestp_arr[i]->mseq->bline) > DESCR_OFFSET) &&

1743	1756	}
1744	1757	fclose(annot_fd);
1745	1758
1746		subs_env(annot_script, sname+1, sizeof(annot_script));
	1759	/* convert '+' in annot_script to ' ' */
	1760	bp = strchr(sname+1,'+');
	1761	for ( ; bp; bp=strchr(bp+1,'+')) {
	1762	*bp=' ';
	1763	}
	1764
	1765	subs_env(annot_script, sname+1, sizeof(annot_script));
1747	1766	annot_script[sizeof(annot_script)-1] = '\0';
1748	1767	SAFE_STRNCAT(annot_script," ",sizeof(annot_script));
1749	1768	SAFE_STRNCAT(annot_script,annot_bline_file,sizeof(annot_script));

1752	1771
1753	1772	/* run annot_script annot_bline_file > annot_descr_file */
1754	1773	status = system(annot_script);
	1774
	1775	#ifdef DEBUG
	1776	if (debug) {
	1777	fprintf(stderr,"%s\n",annot_script);
	1778	}
	1779	#endif
	1780
1755	1781	if (!debug) {
1756	1782	#ifdef UNIX
1757	1783	unlink(annot_bline_file);

2171	2197
2172	2198	q_offset = m_msp->q_offset + m_msp->q_off - 1;
2173	2199	if (q_offset < 0) { q_offset = 0;}
	2200
	2201	/* convert '+' in annot_script to ' ' */
	2202	bp = strchr(sname+1,'+');
	2203	for ( ; bp; bp=strchr(bp+1,'+')) {
	2204	*bp=' ';
	2205	}
	2206
2174	2207	sprintf(annot_script,"%s \"%s\" %ld",sname+1, bline_descr,q_offset+m_msp->n0);
2175	2208	annot_script[sizeof(annot_script)-1] = '\0';
2176	2209

4104	4137	else if (aln && toupper(sp0) == 'N') aln->ngap_q++;
4105	4138	else if (aln && toupper(sp1) == 'N') aln->ngap_l++;
4106	4139	}
	4140	else if ((sp0 == '*' && toupper(sp1) == 'U') \|\|
	4141	(toupper(sp0) == 'U' && sp1 == '*')) {
	4142	spa_val = M_IDENT;
	4143	if (aln) {
	4144	aln->nident++;
	4145	aln->nmismatch--;
	4146	}
	4147	}
4107	4148
4108	4149	/* correct nident, nmismatch for N:N / X:X */
4109	4150	if (pam_x_id_sim < 0) { /* > 0 -> identical, similar */

+9

-6

src/defs.h less more

67	67
68	68	#ifndef MAX_MEMK
69	69	#if defined(BIG_LIB64) && (defined(COMP_THR) \|\| defined(PCOMPLIB))
70		#define MAX_MEMK 810241024 /* 12 GB (<<10) for library in memory */
	70	#define MAX_MEMK 1610241024 /* 16 GB (<<10) for library in memory */
71	71	#else
72	72	#define MAX_MEMK 210241024 /* 2 GB (<<10) for library in memory */
73	73	#endif

151	151	#define MX_M9SUMM 64 /* markx==9(c) */
152	152	#define MX_M10FORM 128 /* markx==10 - verbose output */
153	153	#define MX_M11OUT 256 /* markx==11 - lalign lav */
154		#define MX_M8OUT 512 /* markx==8 blast8 output */
155		#define MX_M8COMMENT 1024 /* markx==8 blast8 output */
156		#define MX_MBLAST 2048 /* markx=B blast output */
157		#define MX_MBLAST2 4096 /* markx=BB more blast output */
	154	#define MX_M8OUT 512 /* markx==8 blast tabular (-outfmt=6) output */
	155	#define MX_M8COMMENT 1024 /* markx==8 blast tabular (-outfmt=7) with comments output */
	156	#define MX_MBLAST 2048 /* markx=B blast alignment -outfmt=0 output */
	157	#define MX_MBLAST2 4096 /* markx=BB blast best scores and alignment (-outfmt=0) output */
158	158	#define MX_ANNOT_COORD 16384 /* -m 0, use -m 0B for both */
159	159	#define MX_ANNOT_MID 32768 /* markx 0M, 1M, 2M annotations in middle */
160	160	#define MX_RES_ALIGN_SCORE (1<<20) /* show residue alignment score, not alignment */
	161	#define MX_M8_BTAB_LEN (1<<21) /* show query/subject seq. lens in -m 8 output */
161	162
162		/* codes for -m 9 */
	163	/* codes for -m 9, -m 8C? */
163	164	#define SHOW_CODE_ID 1 /* identity only */
164	165	#define SHOW_CODE_IDD 2 /* identity with domains */
165	166	#define SHOW_CODE_ALIGN 4 /* encoded alignment */

168	169	#define SHOW_CODE_MASK 12 /* use higher bits for annotation format */
169	170	#define SHOW_CODE_EXT 16 /* encode identity, mismatch state */
170	171	#define SHOW_ANNOT_FULL 32 /* show full-length annot in calc_code */
	172	#define SHOW_CODE_DOMINFO 64 /* include raw domain info in btab/BTOP */
	173

+28

-6

src/doinit.c less more

293	293	m_msp->do_showbest = 1;
294	294	m_msp->ashow = -1;
295	295	m_msp->ashow_set = 0;
	296
296	297	m_msp->nmlen = DEF_NMLEN;
	298
	299
	300	/* values set in initfa.c: parse_ext_opts() */
297	301	m_msp->z_bits = 1;
298	302	m_msp->tot_ident = 0;
	303	m_msp->blast_ident = 0;
	304	m_msp->m8_show_annot = 0;
	305
299	306	m_msp->mshow_set = 0;
300	307	m_msp->mshow_min = 0;
301	308	m_msp->aln.llen = 60;

620	627	else {
621	628	m_msp->ann_arr_def[i_ann] = NULL;
622	629	}
623
624
625	630	}
626	631
627	632	/* read definitions of annotation symbols from a file */

710	715
711	716	return markx;
712	717	}
	718
	719	/* specify output format. If output format type is 'F', then provide
	720	file name and write to file.
	721
	722	Thus, -m "F8CB outfile.m8CB" writes -m 8CB output to outfile.m8CB
	723	Different format outputs can be written to different files
	724
	725	*/
713	726
714	727	void
715	728	pre_parse_markx(char opt_arg, struct mngmsg m_msp) {

757	770
758	771	/* first check for -m "F file" format */
759	772	if (optarg[0] == 'F') {
760		if ((bp=strchr(optarg+1,' '))==NULL) {
	773	if ((bp=strchr(optarg+1,' '))==NULL && (bp=strchr(optarg+1,'='))==NULL) {
761	774	fprintf(stderr,"-m F missing file name: %s\n",optarg);
762	775	return;
763	776	}

823	836	void
824	837	parse_markx(char optarg, struct markx_str this) {
825	838	int itmp;
826		char ctmp, ctmp2;
	839	char ctmp, ctmp2, ctmp3;
827	840
828	841	itmp = 0;
829		ctmp = ctmp2 = '\0';
	842	ctmp = ctmp2 = ctmp3 = '\0';
830	843
831	844	if (optarg[0] == 'B') { /* BLAST alignment output */
832	845	this->markx = MX_MBLAST;

853	866	return;
854	867	}
855	868	else {
856		sscanf(optarg,"%d%c%c",&itmp,&ctmp,&ctmp2);
	869	sscanf(optarg,"%d%c%c%c",&itmp,&ctmp,&ctmp2,&ctmp3);
857	870	}
858	871	if (itmp==9) {
859	872	if (ctmp=='c') {this->show_code = SHOW_CODE_ALIGN;}

876	889	else if (ctmp2 == 'C') {this->show_code = SHOW_CODE_CIGAR;}
877	890	else if (ctmp2 == 'D') {this->show_code = SHOW_CODE_CIGAR + SHOW_CODE_EXT;}
878	891	else if (ctmp2 == 'B') {this->show_code = SHOW_CODE_BTOP;}
	892
	893	if (ctmp3 == 'L') {
	894	this->markx \|= MX_M8_BTAB_LEN;
	895	this->show_code \|= SHOW_CODE_DOMINFO;
	896	}
	897	else if (ctmp3 == 'l') {
	898	this->markx \|= MX_M8_BTAB_LEN;
	899	}
	900
879	901	}
880	902	}
881	903

+42

-25

src/dropff2.c less more

116	116
117	117	f_str = (struct f_struct *) calloc(1, sizeof(struct f_struct));
118	118	if(f_str == NULL) {
119		fprintf(stderr, "Couldn't calloc f_str\n");
	119	fprintf(stderr, "*** error [%s:%d] - cannot calloc f_str [%lu]\n",
	120	__FILE__, __LINE__, sizeof(struct f_struct));
120	121	exit(1);
121	122	}
122	123

134	135	if (ppst->hsq[i0] < NMAP && ppst->hsq[i0] > mhv) mhv = ppst->hsq[i0];
135	136
136	137	if (mhv <= 0) {
137		fprintf (stderr, " maximum hsq <=0 %d\n", mhv);
	138	fprintf (stderr, "*** error [%s:%d] - maximum hsq <=0 %d\n",
	139	__FILE__, __LINE__, mhv);
138	140	exit (1);
139	141	}
140	142

146	148	f_str->hmask = (hmax >> f_str->kshft) - 1;
147	149
148	150	if ((f_str->aa0 = (unsigned char *) calloc(n0+1, sizeof(char))) == NULL) {
149		fprintf (stderr, " cannot allocate f_str->aa0 array; %d\n",n0+1);
	151	fprintf (stderr, "*** error [%s:%d] - cannot allocate f_str->aa0 array; %d\n",
	152	__FILE__, __LINE__, n0+1);
150	153	exit (1);
151	154	}
152	155	for (i=0; i<n0; i++) f_str->aa0[i] = aa0[i];
153	156	aa0 = f_str->aa0;
154	157
155	158	if ((f_str->aa0t = (unsigned char *) calloc(n0+1, sizeof(char))) == NULL) {
156		fprintf (stderr, " cannot allocate f_str0->aa0t array; %d\n",n0+1);
	159	fprintf (stderr, "*** error [%s:%d] - cannot allocate f_str0->aa0t array; %d\n",
	160	__FILE__, __LINE__, n0+1);
157	161	exit (1);
158	162	}
159	163	f_str->aa0ix = 0;
160	164
161	165	if ((f_str->harr = (struct hlstr *) calloc (hmax, sizeof (struct hlstr))) == NULL) {
162		fprintf (stderr, " cannot allocate hash array; hmax: %d hmask: %d\n",
163		hmax,f_str->hmask);
	166	fprintf (stderr, "*** error [%s:%d] - cannot allocate hash array; hmax: %d hmask: %d\n",
	167	__FILE__, __LINE__, hmax,f_str->hmask);
164	168	exit (1);
165	169	}
166	170	if ((f_str->pamh1 = (int *) calloc (nsq+1, sizeof (int))) == NULL) {
167		fprintf (stderr, " cannot allocate pamh1 array\n");
	171	fprintf (stderr, "*** error [%s:%d] - cannot allocate pamh1 array [%d]\n",
	172	__FILE__, __LINE__, nsq+1);
168	173	exit (1);
169	174	}
170	175	if ((f_str->pamh2 = (int *) calloc (hmax, sizeof (int))) == NULL) {
171		fprintf (stderr, " cannot allocate pamh2 array\n");
	176	fprintf (stderr, "*** error [%s:%d] - cannot allocate pamh2 array [%d]\n",
	177	__FILE__, __LINE__, hmax);
172	178	exit (1);
173	179	}
174	180	if ((f_str->link = (struct hlstr *) calloc (n0, sizeof (struct hlstr))) == NULL) {
175		fprintf (stderr, " cannot allocate hash link array");
	181	fprintf (stderr, "*** error [%s:%d] - cannot allocate hash link array [%d]",
	182	__FILE__, __LINE__, n0);
176	183	exit (1);
177	184	}
178	185

247	254	f_str->maxsav = MAXSAV;
248	255	if ((f_str->vmax = (struct savestr *)
249	256	calloc(MAXSAV,sizeof(struct savestr)))==NULL) {
250		fprintf(stderr, "Couldn't allocate vmax[%d].\n",f_str->maxsav);
	257	fprintf(stderr, "*** error [%s:%d] - cannot allocate vmax[%d].\n",
	258	__FILE__, __LINE__, f_str->maxsav);
251	259	exit(1);
252	260	}
253	261
254	262	if ((f_str->vptr = (struct savestr **)
255	263	calloc(MAXSAV,sizeof(struct savestr *)))==NULL) {
256		fprintf(stderr, "Couldn't allocate vptr[%d].\n",f_str->maxsav);
	264	fprintf(stderr, "*** error [%s:%d] - cannot allocate vptr[%d].\n",
	265	__FILE__, __LINE__, f_str->maxsav);
257	266	exit(1);
258	267	}
259	268
260	269	for (vmptr = f_str->vmax; vmptr < &f_str->vmax[MAXSAV]; vmptr++) {
261	270	vmptr->used = (int *) calloc(n0, sizeof(int));
262	271	if(vmptr->used == NULL) {
263		fprintf(stderr, "Couldn't alloc vmptr->used\n");
	272	fprintf(stderr, "*** error [%s:%d] - cannot alloc vmptr->used [%d]\n",
	273	__FILE__, __LINE__, n0);
264	274	exit(1);
265	275	}
266	276	}

284	294
285	295	if (f_str->diag == NULL)
286	296	{
287		fprintf (stderr, " cannot allocate diagonal arrays: %ld\n",
288		(long) MAXDIAG * (long) (sizeof (struct dstruct)));
	297	fprintf (stderr, "*** error [%s:%d] - cannot allocate diagonal arrays: %ld\n",
	298	__FILE__, __LINE__, (long) MAXDIAG * (long) (sizeof (struct dstruct)));
289	299	exit (1);
290	300	}
291	301

293	303	if ((f_str->aa1x =(unsigned char *)calloc((size_t)ppst->maxlen+2,
294	304	sizeof(unsigned char)))
295	305	== NULL) {
296		fprintf (stderr, "cannot allocate aa1x array %d\n", ppst->maxlen+2);
	306	fprintf (stderr, "*** error [%s:%d] - cannot allocate aa1x array %d\n",
	307	__FILE__, __LINE__, ppst->maxlen+2);
297	308	exit (1);
298	309	}
299	310	f_str->aa1x++;

304	315
305	316	maxn0 = max(3*n0/2,MIN_RES);
306	317	if ((res = (int *)calloc((size_t)maxn0,sizeof(int)))==NULL) {
307		fprintf(stderr,"cannot allocate alignment results array %d\n",maxn0);
	318	fprintf(stderr,"*** error [%s:%d] - cannot allocate alignment results array %d\n",
	319	__FILE__, __LINE__, maxn0);
308	320	exit(1);
309	321	}
310	322	f_str->res = res;

314	326
315	327	/* initialize priors array. */
316	328	if((f_str->priors = (double *)calloc(ppst->nsq+1, sizeof(double))) == NULL) {
317		fprintf(stderr, "Couldn't allocate priors array.\n");
	329	fprintf(stderr, "*** error [%s:%d] - cannot allocate priors array [%d]\n",
	330	__FILE__, __LINE__, ppst->nsq+1);
318	331	exit(1);
319	332	}
320	333	calc_priors(f_str->priors, ppst, f_str, NULL, 0, ppst->pseudocts);

420	433	}
421	434
422	435	if (n0+n1+1 >= MAXDIAG) {
423		fprintf(stderr,"n0,n1 too large: %d, %d\n",n0,n1);
	436	fprintf(stderr,"*** error [%s:%d] - n0,n1 too large %d + %d > %d\n",
	437	__FILE__, __LINE__, n0,n1, MAXDIAG);
424	438	rst->score[0] = rst->score[1] = rst->score[2] = -1;
425	439	rst->escore = 2.0;
426	440	rst->segnum = 0;

642	656	if (ppst->debug_lib)
643	657	for (i=0; i<n10; i++)
644	658	if (f_str->aa1x[i]>ppst->nsq) {
645		fprintf(stderr,
646		"residue[%d/%d] %d range (%d)\n",i,n1,
647		f_str->aa1x[i],ppst->nsq);
	659	fprintf(stderr, "*** error [%s:%d] - residue[%d/%d] %d range (%d)\n",
	660	__FILE__, __LINE__, i,n1, f_str->aa1x[i],ppst->nsq);
648	661	f_str->aa1x[i]=0;
649	662	n10=i-1;
650	663	}

842	855	}
843	856	tot += ctot;
844	857	if (ci >= 0) {
845		if (ci >= n0) {fprintf(stderr," warning - ci off end %d/%d\n",ci,n0);}
	858	if (ci >= n0) {fprintf(stderr,"*** warning [%s:%d] - ci off end %d/%d\n",
	859	__FILE__, __LINE__, ci,n0);}
846	860	else {
847	861	*aa0pt++ = aa0p[ci];
848	862	aa0p[ci] += 32;

855	869	if (aa0t_flg) {
856	870	dmax->dp -= f_str->aa0ix; /* shift ->dp for aa0t */
857	871	if ((ci=(int)(aa0pt-f_str->aa0t)) > n0) {
858		fprintf(stderr," warning - aapt off %d/%d end\n",ci,n0);
	872	fprintf(stderr,"*** warning [%s:%d] - aapt off %d/%d end\n",
	873	__FILE__, __LINE__, ci,n0);
859	874	}
860	875	else
861	876	aa0pt++ = 0; / skip over NULL */

1157	1172	have_ares = 0x2; / set 0x2 bit to indicate local copy */
1158	1173
1159	1174	if ((a_res = (struct a_res_str *)calloc(1, sizeof(struct a_res_str)))==NULL) {
1160		fprintf(stderr," [do_walign] Cannot allocate a_res");
	1175	fprintf(stderr,"*** error [%s:%d] - cannot allocate a_res [%lu]",
	1176	__FILE__, __LINE__, sizeof(struct a_res_str));
1161	1177	return NULL;
1162	1178	}
1163	1179

1180	1196	*/
1181	1197
1182	1198	if ((aa0t = (unsigned char *)calloc(n0+1,sizeof(unsigned char)))==NULL) {
1183		fprintf(stderr," cannot allocate aa0t %d\n",n0+1);
	1199	fprintf(stderr,"*** error [%s:%d] - cannot allocate aa0t %d\n",
	1200	__FILE__, __LINE__, n0+1);
1184	1201	exit(1);
1185	1202	}
1186	1203

+29

-14

src/dropfx.c less more

2065	2065	#define XTERNAL
2066	2066	#include "upam.h"
2067	2067
	2068	/* this code shows the alignment of the protein with the three phased
	2069	translation of the DNA sequence
	2070	*/
	2071
2068	2072	extern void
2069		display_alig(int a, unsigned char dna, unsigned char * pro, int length, int ld)
	2073	display_alig(int a, unsigned char dna_p, unsigned char * pro, int length, int ld)
2070	2074	{
2071	2075	int len = 0, i, j, x, y, lines, k;
2072	2076	char line1[100], line2[100], line3[100],
2073	2077	tmp[10] = " ";
2074		unsigned char dna1, c1, c2, c3, st;
2075
2076		dna1 = ckalloc((size_t)ld);
2077		for (st = dna, i = 0; i < ld; i++, st++) dna1[i] = NCBIstdaa[*st];
	2078	unsigned char dna_p1, c1, c2, c3, st;
	2079
	2080	dna_p1 = ckalloc((size_t)ld);
	2081	for (st = dna_p, i = 0; i < ld; i++, st++) dna_p1[i] = NCBIstdaa[*st];
2078	2082	line1[0] = line2[0] = line3[0] = '\0'; x= a[0]; y = a[1]-1;
2079	2083
2080	2084	for (len = 0, j = 2, lines = 0; j < length; j++) {

2086	2090	if (a[j+1] == 2) tmp[2] = ' ';
2087	2091	}
2088	2092	if (i > 0) {
2089		strncpy(&line1[len], (const char *)&dna1[y], i); y+=i;
2090		} else {line1[len] = '-'; i = 1; tmp[0] = NCBIstdaa[pro[x++]];}
	2093	strncpy(&line1[len], (const char *)&dna_p1[y], i);
	2094	y+=i;
	2095	}
	2096	else {
	2097	line1[len] = '-';
	2098	i = 1;
	2099	tmp[0] = NCBIstdaa[pro[x++]];
	2100	}
2091	2101	strncpy(&line2[len], tmp, i);
2092	2102	for (k = 0; k < i; k++) {
2093	2103	if (tmp[k] != ' ' && tmp[k] != '-') {
2094		if (k == 2) tmp[k] = '\\';
2095		else if (k == 1) tmp[k] = '\|';
2096		else tmp[k] = '/';
2097		} else tmp[k] = ' ';
	2104	if (k == 2) {tmp[k] = '\\';}
	2105	else if (k == 1) { tmp[k] = '\|'; }
	2106	else { tmp[k] = '/'; }
	2107	}
	2108	else { tmp[k] = ' '; }
2098	2109	}
2099	2110	if (i == 1) tmp[0] = ' ';
2100	2111	strncpy(&line3[len], tmp, i);

2103	2114	line1[len] = line2[len] =line3[len] = '\0';
2104	2115	if (len >= WIDTH) {
2105	2116	printf("\n%5d", WIDTH*lines++);
2106		for (k = 10; k <= WIDTH; k+=10)
	2117	for (k = 10; k <= WIDTH; k+=10) {
2107	2118	printf(" . :");
2108		if (k-5 < WIDTH) printf(" .");
	2119	}
	2120	if (k-5 < WIDTH) { printf(" ."); }
2109	2121	c1 = line1[WIDTH]; c2 = line2[WIDTH]; c3 = line3[WIDTH];
2110	2122	line1[WIDTH] = line2[WIDTH] = line3[WIDTH] = '\0';
	2123
2111	2124	printf("\n %s\n %s\n %s\n", line1, line3, line2);
	2125
2112	2126	line1[WIDTH] = c1; line2[WIDTH] = c2; line3[WIDTH] = c3;
2113	2127	strncpy(line1, &line1[WIDTH], sizeof(line1)-1);
2114	2128	strncpy(line2, &line2[WIDTH], sizeof(line2)-1);

2122	2136	if (k-5 < len) printf(" .");
2123	2137	printf("\n %s\n %s\n %s\n", line1, line3, line2);
2124	2138	}
2125
2126	2139
2127	2140	/* alignment store the operation that align the protein and dna sequence.
2128	2141	The code of the number in the array is as follows:

2137	2150	in the protein and dna sequences in the local alignment.
2138	2151
2139	2152	Display looks like where WIDTH is assumed to be divisible by 10.
	2153
	2154	-- this alignment is incorrect, protein phases rather than DNA are shown --
2140	2155
2141	2156	0 . : . : . : . : . : . :
2142	2157	CCTATGATACTGGGATACTGGAACGTCCGCGGACTGACACACCCGATCCGCATGCTCCTG

+157

-53

src/dropfx2.c less more

281	281	if (hsq[i0] < NMAP && hsq[i0] > mhv) mhv = hsq[i0];
282	282
283	283	if (mhv <= 0) {
284		fprintf (stderr, " maximum hsq <=0 %d\n", mhv);
	284	fprintf (stderr, "*** error [%s:%d] - maximum hsq <=0 %d\n",
	285	__FILE__, __LINE__, mhv);
285	286	exit (1);
286	287	}
287	288

298	299	f_str->hmask = (hmax >> f_str->kshft) - 1;
299	300
300	301	if ((f_str->harr = (int *) calloc (hmax, sizeof (int))) == NULL) {
301		fprintf (stderr, " cannot allocate hash array\n");
	302	fprintf (stderr, "*** error [%s:%d] - cannot allocate hash array [%d]\n",
	303	__FILE__, __LINE__, hmax );
302	304	exit (1);
303	305	}
304	306	if ((f_str->pamh1 = (int *) calloc (ppst->nsq+1, sizeof (int))) == NULL) {
305		fprintf (stderr, " cannot allocate pamh1 array\n");
	307	fprintf (stderr, "*** error [%s:%d] - cannot allocate pamh1 array [%d]\n",
	308	__FILE__, __LINE__, ppst->nsq+1);
306	309	exit (1);
307	310	}
308	311	if ((f_str->pamh2 = (int *) calloc (hmax, sizeof (int))) == NULL) {
309		fprintf (stderr, " cannot allocate pamh2 array\n");
	312	fprintf (stderr, "*** error [%s:%d] - cannot allocate pamh2 array [%d]\n",
	313	__FILE__, __LINE__, hmax);
310	314	exit (1);
311	315	}
312	316	if ((f_str->link = (int *) calloc (n0, sizeof (int))) == NULL) {
313		fprintf (stderr, " cannot allocate hash link array");
	317	fprintf (stderr, "*** error [%s:%d] - cannot allocate hash link array [%d]",
	318	__FILE__, __LINE__, n0);
314	319	exit (1);
315	320	}
316	321

318	323	if ((f_str->aa1x =(unsigned char *)calloc((size_t)ppst->maxlen+2,
319	324	sizeof(unsigned char)))
320	325	== NULL) {
321		fprintf (stderr, "cannot allocate aa1x array %d\n", ppst->maxlen+2);
	326	fprintf (stderr, "*** error [%s:%d] - cannot allocate aa1x array %d\n",
	327	__FILE__, __LINE__, ppst->maxlen+2);
322	328	exit (1);
323	329	}
324	330	f_str->aa1x++;

326	332	if ((f_str->aa1y =(unsigned char *)calloc((size_t)ppst->maxlen+2,
327	333	sizeof(unsigned char)))
328	334	== NULL) {
329		fprintf (stderr, "cannot allocate aa1y array %d\n", ppst->maxlen+2);
	335	fprintf (stderr, "*** error [%s:%d] - cannot allocate aa1y array %d\n",
	336	__FILE__, __LINE__, ppst->maxlen+2);
330	337	exit (1);
331	338	}
332	339	f_str->aa1y++;

334	341	maxn0 = n0 + 2;
335	342	if ((aa0x =(unsigned char *)calloc((size_t)maxn0,sizeof(unsigned char)))
336	343	== NULL) {
337		fprintf (stderr, "cannot allocate aa0x array %d\n", maxn0);
	344	fprintf (stderr, "*** error [%s:%d] - cannot allocate aa0x array %d\n",
	345	__FILE__, __LINE__, maxn0);
338	346	exit (1);
339	347	}
340	348	aa0x++;

342	350
343	351	if ((aa0y =(unsigned char *)calloc((size_t)maxn0,sizeof(unsigned char)))
344	352	== NULL) {
345		fprintf (stderr, "cannot allocate aa0y array %d\n", maxn0);
	353	fprintf (stderr, "*** error [%s:%d] - cannot allocate aa0y array %d\n",
	354	__FILE__, __LINE__, maxn0);
346	355	exit (1);
347	356	}
348	357	aa0y++;

437	446	#ifndef ALLOCN0
438	447	if ((f_str->diag = (struct dstruct *) calloc ((size_t)MAXDIAG,
439	448	sizeof (struct dstruct)))==NULL) {
440		fprintf (stderr," cannot allocate diagonal arrays: %ld\n",
	449	fprintf (stderr,"*** error [%s:%d] - cannot allocate diagonal arrays: %ld\n",
	450	__FILE__, __LINE__,
441	451	(long) MAXDIAG *sizeof (struct dstruct));
442	452	exit (1);
443	453	};
444	454	#else
445	455	if ((f_str->diag = (struct dstruct *) calloc ((size_t)n0,
446	456	sizeof (struct dstruct)))==NULL) {
447		fprintf (stderr," cannot allocate diagonal arrays: %ld\n",
448		(long)n0*sizeof (struct dstruct));
	457	fprintf (stderr,"*** error [%s:%d] - cannot allocate diagonal arrays: %ld\n",
	458	__FILE__, __LINE__, (long)n0*sizeof (struct dstruct));
449	459	exit (1);
450	460	};
451	461	#endif
452	462
453	463
454	464	if ((waa= (int )malloc (sizeof(int)(nsq+1)*n0)) == NULL) {
455		fprintf(stderr,"cannot allocate waa struct %3d\n",nsq*n0);
	465	fprintf(stderr,"*** error [%s:%d] - cannot allocate waa struct %3d\n",
	466	__FILE__, __LINE__, nsq*n0);
456	467	exit(1);
457	468	}
458	469

466	477	f_str->waa0 = waa;
467	478
468	479	if ((waa= (int )malloc (sizeof(int)(nsq+1)*n0)) == NULL) {
469		fprintf(stderr,"cannot allocate waa struct %3d\n",nsq*n0);
	480	fprintf(stderr,"*** error [%s:%d] - cannot allocate waa struct %3d\n",
	481	__FILE__, __LINE__, nsq*n0);
470	482	exit(1);
471	483	}
472	484

488	500	maxn0 = max(4*n0,MIN_RES);
489	501	#endif
490	502	if ((res = (int *)calloc((size_t)maxn0,sizeof(int)))==NULL) {
491		fprintf(stderr,"cannot allocate alignment results array %d\n",maxn0);
	503	fprintf(stderr,"*** error [%s:%d] -cannot allocate alignment results array %d\n",
	504	__FILE__, __LINE__, maxn0);
492	505	exit(1);
493	506	}
494	507	f_str->res = res;

690	703	}
691	704
692	705	if (n0+n1+1 >= MAXDIAG) {
693		fprintf(stderr,"n0,n1 too large: %d, %d\n",n0,n1);
	706	fprintf(stderr,"*** error [%s:%d] - n0,n1 too large > %d: %d, %d\n",
	707	__FILE__, __LINE__, n0,n1, MAXDIAG);
694	708	rst->score[0] = rst->score[1] = rst->score[2] = -1;
695	709	return;
696	710	}

1523	1537	}
1524	1538
1525	1539	if (i >= max_res) {
1526		fprintf(stderr," alignment truncated: %d/%d\n", max_res,i);
	1540	fprintf(stderr,"*** error [%s:%d] - alignment truncated: %d > %d (max_res)\n",
	1541	__FILE__, __LINE__, i, max_res);
1527	1542	}
1528	1543
1529	1544	up = &up[-3]; down = &down[-3]; tp = &tp[-3];

1580	1595	ld += 2;
1581	1596	init_ROW(up, ld+1); /* set to zero */
1582	1597	init_ROW(down, ld+1); /* set to zero */
1583
1584	1598
1585	1599	cur = up+1;
1586	1600	last = down+1;

2070	2084	#define XTERNAL
2071	2085	#include "upam.h"
2072	2086
	2087	/* this code is not used by the program, it was included for testing */
	2088	/* display_alig(align_enc, dna_p, *prot, length, ld) takes the
	2089
	2090	alignment encoding, and the DNA and protein sequences, and produces an alignment.
	2091	*dna_p is the three phases of the translated DNA sequence
	2092	*prot is the original protein sequence
	2093
	2094	length is the length of the encoding
	2095	ld is the length of the alignment(?)
	2096
	2097	the first two entries in align_enc[] are the start of the protein
	2098	and DNA sequences.
	2099
	2100	The encoding is: (why no code 1?:)
	2101
	2102	0: delete amino acid.
	2103	2: frame shift, 2 nucleotides match with an amino acid
	2104	3: match an amino acid with a codon
	2105	4: the other type of frame shift
	2106	5: delete of a codon
	2107
	2108	One of the properties of this encoding is that it indicates the
	2109	amount that the DNA sequence index needs to be incremented after
	2110	prot match (except for 5)
	2111
	2112	*/
	2113
2073	2114	extern void
2074		display_alig(int a, unsigned char dna, unsigned char * pro, int length, int ld)
	2115	display_alig(int a, unsigned char dna_p, unsigned char * pro, int length, int ld)
2075	2116	{
2076	2117	int len = 0, i, j, x, y, lines, k;
2077	2118	char line1[100], line2[100], line3[100],
2078	2119	tmp[10] = " ";
2079		unsigned char dna1, c1, c2, c3, st;
2080
2081		dna1 = ckalloc((size_t)ld);
2082		for (st = dna, i = 0; i < ld; i++, st++) dna1[i] = NCBIstdaa[*st];
2083		line1[0] = line2[0] = line3[0] = '\0'; x= a[0]; y = a[1]-1;
	2120	unsigned char dna_p1, c1, c2, c3, st;
	2121
	2122	dna_p1 = ckalloc((size_t)ld); /* dna_p1 is the ascii (sq0) translated-DNA residue */
	2123
	2124	/* generate the ascii aa characters */
	2125	for (st = dna_p, i = 0; i < ld; i++, st++) {
	2126	dna_p1[i] = NCBIstdaa[*st];
	2127	}
	2128	line1[0] = line2[0] = line3[0] = '\0';
	2129
	2130	x= a[0]; /* start in protein */
	2131	y = a[1]-1; /* start in DNA */
2084	2132
2085	2133	for (len = 0, j = 2, lines = 0; j < length; j++) {
2086		i = a[j];
	2134	i = a[j]; /* i is align_enc value 0-5 */
2087	2135	/printf("%d %d %d\n", i, len, b->j);/
	2136
2088	2137	if (i > 0 && i < 5) tmp[i-2] = NCBIstdaa[pro[x++]];
2089		if (i == 5) {
2090		i = 3; tmp[0] = tmp[1] = tmp[2] = '-';
	2138	if (i == 5) { /* special case */
	2139	i = 3; /* increment DNA value by 3, prot by 0 */
	2140	tmp[0] = tmp[1] = tmp[2] = '-';
2091	2141	if (a[j+1] == 2) tmp[2] = ' ';
2092	2142	}
2093	2143	if (i > 0) {
2094		strncpy(&line1[len], (const char *)&dna1[y], i); y+=i;
2095		} else {line1[len] = '-'; i = 1; tmp[0] = NCBIstdaa[pro[x++]];}
	2144	strncpy(&line1[len], (const char *)&dna_p1[y], i);
	2145	y+=i;
	2146	}
	2147	else {
	2148	line1[len] = '-';
	2149	i = 1;
	2150	tmp[0] = NCBIstdaa[pro[x++]];
	2151	}
	2152
2096	2153	strncpy(&line2[len], tmp, i);
	2154
2097	2155	for (k = 0; k < i; k++) {
2098	2156	if (tmp[k] != ' ' && tmp[k] != '-') {
2099	2157	if (k == 2) tmp[k] = '\\';

2128	2186	printf("\n %s\n %s\n %s\n", line1, line3, line2);
2129	2187	}
2130	2188
2131
2132	2189	/* alignment store the operation that align the protein and dna sequence.
2133	2190	The code of the number in the array is as follows:
2134	2191	0: delete of an amino acid.

2137	2194	4: the other type of frame shift
2138	2195	5: delete of a codon
2139	2196
2140
2141	2197	Also the first two element of the array stores the starting point
2142	2198	in the protein and dna sequences in the local alignment.
2143	2199

2378	2434
2379	2435	/* now we need alignment storage - get it */
2380	2436	if ((cur_ares->res = (int *)calloc((size_t)max_res,sizeof(int)))==NULL) {
2381		fprintf(stderr," *** cannot allocate alignment results array %d\n",max_res);
	2437	fprintf(stderr,"*** error [%s:%d] - cannot allocate alignment results array %d\n",
	2438	__FILE__, __LINE__, max_res);
2382	2439	exit(1);
2383	2440	}
2384	2441

2599	2656	have_ares = 0x3; / set 0x2 bit to indicate local copy */
2600	2657
2601	2658	if ((a_res = (struct a_res_str *)calloc(1, sizeof(struct a_res_str)))==NULL) {
2602		fprintf(stderr," [do_walign] Cannot allocate a_res");
	2659	fprintf(stderr,"*** error [%s:%d] - cannot allocate a_res [%lu]",
	2660	__FILE__, __LINE__, sizeof(struct a_res_str));
2603	2661	return NULL;
2604	2662	}
2605	2663

2647	2705	#endif
2648	2706	/*
2649	2707	if (a_res->res[0] != 3) {
2650		fprintf(stderr, "*** alignment does not start with match: %d\n",a_res->res[0]);
	2708	fprintf(stderr, "*** error [%s:%d] - alignment does not start with match: %d\n",
	2709	__FILE__, __LINE__, a_res->res[0]);
2651	2710	}
2652	2711	*/
2653	2712
2654	2713	#ifdef DEBUG
2655	2714	if (adler32(1L,aa1,n1) != adler32_crc) {
2656		fprintf(stderr,"[dropfx.c/do_walign] adler32_crc mismatch n1: %d\n",n1);
	2715	fprintf(stderr,"*** error [%s:%d] - adler32_crc mismatch n1: %d\n",
	2716	__FILE__, __LINE__, n1);
2657	2717	}
2658	2718	#endif
2659	2719

2730	2790	}
2731	2791
2732	2792	/*
2733		Alignment: store the operation that align the protein and dna sequence.
	2793	Alignment: store the operation that aligns the protein and dna sequences.
2734	2794	The code of the number in the array is as follows:
2735	2795	0: delete of an amino acid.
2736	2796	2: frame shift, 2 nucleotides match with an amino acid

2977	3037	else if (calc_func_mode == CALC_ID \|\| calc_func_mode == CALC_ID_DOM) {
2978	3038	have_ann = (annotp_p && annotp_p->n_annot > 0);
2979	3039	spa_p = &spa_c;
2980		sp0_p = &sp0_c;
2981		sp1_p = &sp1_c;
2982
2983		sp0a_p = &sp0a_c;
2984		sp1a_p = &sp1a_c;
	3040	sp0_p = &sp1_c;
	3041	sp1_p = &sp0_c;
	3042
	3043	sp0a_p = &sp1a_c;
	3044	sp1a_p = &sp0a_c;
2985	3045	annot_fmt = 3;
2986	3046
2987	3047	/* does not require aa0a/aa1a, only for variants */
2988	3048	}
2989	3049	else if (calc_func_mode == CALC_CODE) {
2990	3050	spa_p = &spa_c;
2991		sp0_p = &sp0_c;
2992		sp1_p = &sp1_c;
2993
2994		sp0a_p = &sp0a_c;
2995		sp1a_p = &sp1a_c;
	3051	sp0_p = &sp1_c;
	3052	sp1_p = &sp0_c;
	3053
	3054	sp0a_p = &sp1a_c;
	3055	sp1a_p = &sp0a_c;
2996	3056
2997	3057	show_code = (display_code & (SHOW_CODE_MASK+SHOW_CODE_EXT)); /* see defs.h; SHOW_CODE_ALIGN=2,_CIGAR=3,_CIGAR_EXT=4 */
2998	3058	annot_fmt = 2;

3017	3077	rpmax = &a_res->res[a_res->nres];
3018	3078
3019	3079	lenc = not_c = aln->nident = aln->nmismatch = aln->nsim = aln->npos = ngap_p = ngap_d = nfs= 0;
	3080
3020	3081	i0 = a_res->min1;
3021	3082	i1 = a_res->min0;
3022	3083

3141	3202	*spa_p = M_DEL;
3142	3203
3143	3204	if (calc_func_mode == CALC_CODE) {
	3205	#ifndef TFAST
3144	3206	update_code(align_code_dyn, update_data_p, 2, spa_p,sp0_p,*sp1_p);
	3207	#else
	3208	update_code(align_code_dyn, update_data_p, 2, spa_p,sp1_p,*sp0_p);
	3209	#endif
	3210
3145	3211	}
3146	3212
3147	3213	if (calc_func_mode == CALC_CONS) {

3218	3284	spa_p = align_type(itmp, sp0_p, *sp1_p, 0, aln, ppst->pam_x_id_sim);
3219	3285
3220	3286	if (calc_func_mode == CALC_CODE) {
	3287	#ifndef TFAST
3221	3288	update_code(align_code_dyn, update_data_p, 3, spa_p,sp0_p,*sp1_p);
	3289	#else
	3290	update_code(align_code_dyn, update_data_p, 3, spa_p,sp1_p,*sp0_p);
	3291	#endif
3222	3292	}
3223	3293
3224	3294	d1_alen++;

3320	3390	if (cumm_seq_score) *i_spa++ = itmp;
3321	3391
3322	3392	if (calc_func_mode == CALC_CODE) {
	3393	#ifndef TFAST
3323	3394	update_code(align_code_dyn, update_data_p, 3, spa_p, sp0_p, *sp1_p);
	3395	#else
	3396	update_code(align_code_dyn, update_data_p, 3, spa_p, sp1_p, *sp0_p);
	3397	#endif
3324	3398
3325	3399	if (have_push_features) {
3326	3400	add_annot_code(have_ann, sp0_p, sp1_p, *sp1a_p,

3366	3440	*spa_p = M_DEL;
3367	3441
3368	3442	if (calc_func_mode == CALC_CODE) {
	3443	#ifndef TFAST
3369	3444	update_code(align_code_dyn, update_data_p, 4, spa_p, sp0_p, *sp1_p);
	3445	#else
	3446	update_code(align_code_dyn, update_data_p, 4, spa_p, sp1_p, *sp0_p);
	3447	#endif
3370	3448	}
3371	3449
3372	3450	if (calc_func_mode == CALC_CONS) {sp0_p++; sp1_p++; spa_p++;}

3435	3513	if (*spa_p == M_IDENT) {d1_ident++;}
3436	3514
3437	3515	if (calc_func_mode == CALC_CODE) {
	3516	#ifndef TFAST
3438	3517	update_code(align_code_dyn, update_data_p, 3, spa_p,sp0_p,*sp1_p);
	3518	#else
	3519	update_code(align_code_dyn, update_data_p, 3, spa_p,sp1_p,*sp0_p);
	3520	#endif
3439	3521	}
3440	3522
3441	3523	if (cumm_seq_score) *i_spa++ = itmp;

3484	3566
3485	3567	if (calc_func_mode == CALC_CODE) {
3486	3568	*spa_p = 5;
	3569	#ifndef TFAST
3487	3570	update_code(align_code_dyn, update_data_p, 5, spa_p,sp0_p,*sp1_p);
	3571	#else
	3572	update_code(align_code_dyn, update_data_p, 5, spa_p,sp1_p,*sp0_p);
	3573	#endif
3488	3574	}
3489	3575
3490	3576	if (calc_func_mode == CALC_CONS) {sp0_p++; sp1_p++; spa_p++;}

3614	3700	*/
3615	3701
3616	3702	static struct update_code_str *
3617		init_update_data(show_code) {
	3703	init_update_data(int show_code) {
3618	3704
3619	3705	struct update_code_str *update_data_p;
3620	3706

3716	3802
3717	3803	/* only aligned identities update counts */
3718	3804	if (op==3 && sim_code == M_IDENT) {
3719		up_dp->p_op_cnt++;
3720		return;
	3805	if ((sp0 == '' && (sp1 == '' \|\| toupper(sp1) == 'U'))
	3806	\|\| (sp1 == '' && (sp0 == '' \|\| toupper(sp0) == 'U'))) {
	3807	if (up_dp->p_op_cnt > 0) {
	3808	sprintf(tmp_str,"%d**",up_dp->p_op_cnt);
	3809	up_dp->p_op_cnt = 0;
	3810	return;
	3811	}
	3812	}
	3813	else {
	3814	up_dp->p_op_cnt++;
	3815	return;
	3816	}
3721	3817	}
3722	3818	else {
3723	3819	if (up_dp->p_op_cnt > 0) {

3785	3881	}
3786	3882	}
3787	3883	else { /* have a termination codon, output for !SHOW_CODE_CIGAR */
3788		if (!up_dp->cigar_order) {
3789		if (sp0 == '' \|\| sp1 == '') { op = 6;}
3790		}
3791		else if (up_dp->show_ext && (sp0 != sp1)) { op = 1;}
	3884	if (!up_dp->cigar_order) { /* -m9c : -m9C and -m8CC are cigar_order */
	3885	if (sp0 == '' \|\| sp1 == '') {
	3886	/* op = 6 gets '' from op_map="-x/=\\+" when the string is closed */
	3887	op = 6;
	3888	}
	3889	}
	3890	else if (sp0=='' && sp1=='') {
	3891	op=6;
	3892	}
	3893	else if (up_dp->show_ext && (sp0 != sp1)) {
	3894	op = 1;
	3895	}
3792	3896	}
3793	3897
3794	3898	if (up_dp->p_op_cnt == 0) {

+113

-53

src/dropfz3.c less more

218	218	char le[MAXLC+1][64];
219	219
220	220	if (naa > MAXLC) {
221		fprintf(stderr,"* dropfz2.c compilation problem naa(%d) > MAXLX(%d) *\n",
222		naa, MAXLC);
	221	fprintf(stderr,"* error [%s:%d] - compilation problem naa(%d) > MAXLC(%d) *\n",
	222	__FILE__, __LINE__, naa, MAXLC);
223	223	}
224	224
225	225	if ((weighti=(struct wgt )calloc((size_t)(naa+1),sizeof(struct wgt )))
226	226	==NULL) {
227		fprintf(stderr," cannot allocate weights array: %d\n",naa);
	227	fprintf(stderr,"*** error [%s:%d] - cannot allocate weights array: %d\n",
	228	__FILE__, __LINE__, naa);
228	229	exit(1);
229	230	}
230	231

233	234	for (aa=0; aa <= naa; aa++) {
234	235	if ((weight[aa]=(struct wgt *)calloc((size_t)256,sizeof(struct wgt)))
235	236	==NULL) {
236		fprintf(stderr," cannot allocate weight[]: %d/%d\n",aa,naa);
	237	fprintf(stderr,"*** error [%s:%d] - cannot allocate weight[]: %d/%d\n",
	238	__FILE__, __LINE__, aa,naa);
237	239	exit(1);
238	240	}
239	241	}

242	244	if (weightci !=NULL) {
243	245	if ((weightci=(struct wgtc *)calloc((size_t)(naa+1),
244	246	sizeof(struct wgtc *)))==NULL) {
245		fprintf(stderr," cannot allocate weight_c array: %d\n",naa);
	247	fprintf(stderr,"*** error [%s:%d] - cannot allocate weight_c array: %d\n",
	248	__FILE__, __LINE__, naa);
246	249	exit(1);
247	250	}
248	251	weightc = *weightci;

250	253	for (aa=0; aa <= naa; aa++) {
251	254	if ((weightc[aa]=(struct wgtc *)calloc((size_t)256,sizeof(struct wgtc)))
252	255	==NULL) {
253		fprintf(stderr," cannot allocate weightc[]: %d/%d\n",aa,naa);
	256	fprintf(stderr,"*** error [%s:%d] - cannot allocate weightc[]: %d/%d\n",
	257	__FILE__, __LINE__, aa,naa);
254	258	exit(1);
255	259	}
256	260	}

411	415	#endif
412	416
413	417	if (nt[NT_N] != 'N') {
414		fprintf(stderr," nt[NT_N] (%d) != 'X' (%c) - recompile\n",NT_N,nt[NT_N]);
	418	fprintf(stderr,"*** error [%s:%d] - nt[NT_N] (%d) != 'X' (%c) - recompile\n",
	419	__FILE__, __LINE__, NT_N,nt[NT_N]);
415	420	exit(1);
416	421	}
417	422

460	465	if ((aa0x =(unsigned char *)calloc((size_t)maxn0,
461	466	sizeof(unsigned char)))
462	467	== NULL) {
463		fprintf (stderr, "cannot allocate aa0x array %d\n", maxn0);
	468	fprintf (stderr, "*** error [%s:%d] - cannot allocate aa0x array %d\n",
	469	__FILE__, __LINE__, maxn0);
464	470	exit (1);
465	471	}
466	472	aa0x++;

470	476	if ((aa0v =(unsigned char *)calloc((size_t)maxn0,
471	477	sizeof(unsigned char)))
472	478	== NULL) {
473		fprintf (stderr, "cannot allocate aa0v array %d\n", maxn0);
	479	fprintf (stderr, "*** error [%s:%d] - cannot allocate aa0v array %d\n",
	480	__FILE__, __LINE__, maxn0);
474	481	exit (1);
475	482	}
476	483	aa0v++;

522	529	if (hsq[i0] < NMAP && hsq[i0] > mhv)
523	530	mhv = ppst->hsq[i0];
524	531
525		if (mhv <= 0)
526		{
527		fprintf (stderr, " maximum hsq <=0 %d\n", mhv);
	532	if (mhv <= 0) {
	533	fprintf (stderr, "*** error [%s:%d] - maximum hsq <=0 %d\n",
	534	__FILE__, __LINE__, mhv);
528	535	exit (1);
529	536	}
530	537

539	546	f_str->hmask = (hmax >> f_str->kshft) - 1;
540	547
541	548	if ((f_str->harr = (int *) calloc (hmax, sizeof (int))) == NULL) {
542		fprintf (stderr, " cannot allocate hash array\n");
	549	fprintf (stderr, "*** error [%s:%d] - cannot allocate hash array [%d]\n",
	550	__FILE__, __LINE__, hmax);
543	551	exit (1);
544	552	}
545	553	if ((f_str->pamh1 = (int *) calloc (ppst->nsq+1, sizeof (int))) == NULL) {
546		fprintf (stderr, " cannot allocate pamh1 array\n");
	554	fprintf (stderr, "*** error [%s:%d] - cannot allocate pamh1 array [%d]\n",
	555	__FILE__, __LINE__, ppst->nsq+1);
547	556	exit (1);
548	557	}
549		if ((f_str->pamh2 = (int *) calloc (hmax, sizeof (int))) == NULL) {
550		fprintf (stderr, " cannot allocate pamh2 array\n");
	558	if ((f_str->pamh2 = (int *)calloc (hmax, sizeof (int))) == NULL) {
	559	fprintf (stderr, "*** error [%s:%d] - cannot allocate pamh2 array [%d]\n",
	560	__FILE__, __LINE__, hmax);
551	561	exit (1);
552	562	}
553	563	if ((f_str->link = (int *) calloc (n0, sizeof (int))) == NULL) {
554		fprintf (stderr, " cannot allocate hash link array");
	564	fprintf (stderr, "*** error [%s:%d] - cannot allocate hash link array [%d]",
	565	__FILE__, __LINE__, n0);
555	566	exit (1);
556	567	}
557	568

614	625	#ifndef ALLOCN0
615	626	if ((f_str->diag = (struct dstruct *) calloc ((size_t)MAXDIAG,
616	627	sizeof (struct dstruct)))==NULL) {
617		fprintf (stderr," cannot allocate diagonal arrays: %lu\n",
618		MAXDIAG *sizeof (struct dstruct));
	628	fprintf (stderr,"*** error [%s:%d] - cannot allocate diagonal arrays: %lu\n",
	629	__FILE__, __LINE__, MAXDIAG *sizeof (struct dstruct));
619	630	exit (1);
620	631	};
621	632	#else
622	633	if ((f_str->diag = (struct dstruct *) calloc ((size_t)n0,
623	634	sizeof (struct dstruct)))==NULL) {
624		fprintf (stderr," cannot allocate diagonal arrays: %ld\n",
625		(long)n0*sizeof (struct dstruct));
	635	fprintf (stderr,"*** error [%s:%d] - cannot allocate diagonal arrays: %ld\n",
	636	__FILE__, __LINE__, (long)n0*sizeof (struct dstruct));
626	637	exit (1);
627	638	};
628	639	#endif

636	647	if ((f_str->aa1x =(unsigned char *)calloc((size_t)ppst->maxlen+4,
637	648	sizeof(unsigned char)))
638	649	== NULL) {
639		fprintf (stderr, "cannot allocate aa1x array %d\n", ppst->maxlen+4);
	650	fprintf (stderr, "*** error [%s:%d] - cannot allocate aa1x array %d\n",
	651	__FILE__, __LINE__, ppst->maxlen+4);
640	652	exit (1);
641	653	}
642	654	f_str->aa1x++;
643	655
644	656	if ((f_str->aa1v =(unsigned char *)calloc((size_t)ppst->maxlen+4,
645	657	sizeof(unsigned char))) == NULL) {
646		fprintf (stderr, "cannot allocate aa1v array %d\n", ppst->maxlen+4);
	658	fprintf (stderr, "*** error [%s:%d] - cannot allocate aa1v array %d\n",
	659	__FILE__, __LINE__, ppst->maxlen+4);
647	660	exit (1);
648	661	}
649	662	f_str->aa1v++;

651	664	#endif
652	665
653	666	if ((waa= (int )malloc (sizeof(int)(nsq+1)*n0)) == NULL) {
654		fprintf(stderr,"cannot allocate waa struct %3d\n",nsq*n0);
	667	fprintf(stderr,"*** error [%s:%d] - cannot allocate waa struct %3d\n",
	668	__FILE__, __LINE__, nsq*n0);
655	669	exit(1);
656	670	}
657	671

670	684	maxn0 = max(4*n0,MIN_RES);
671	685	#endif
672	686	if ((res = (int *)calloc((size_t)maxn0,sizeof(int)))==NULL) {
673		fprintf(stderr,"cannot allocate alignment results array %d\n",maxn0);
	687	fprintf(stderr,"*** error [%s:%d] - cannot allocate alignment results array %d\n",
	688	__FILE__, __LINE__, maxn0);
674	689	exit(1);
675	690	}
676	691	f_str->res = res;

848	863	}
849	864
850	865	if (n0+n1+1 >= MAXDIAG) {
851		fprintf(stderr,"n0,n1 too large: %d, %d\n",n0,n1);
	866	fprintf(stderr,"*** error [%s:%d] - n0,n1 too large > %d: %d, %d\n",
	867	__FILE__, __LINE__, n0,n1, MAXDIAG);
852	868	rst->score[0] = rst->score[1] = rst->score[2] = -1;
853	869	return;
854	870	}

1096	1112	aa1x = f_str->aa1x;
1097	1113	#ifdef DEBUG
1098	1114	if (frame > 1) {
1099		fprintf(stderr, "*** fz_walign - frame: %d - out of range [0,1]\n",frame);
	1115	fprintf(stderr, "*** error [%s:%d] - fz_walign - frame: %d - out of range [0,1]\n",
	1116	__FILE__, __LINE__, frame);
1100	1117	}
1101	1118	#endif
1102	1119

1632	1649	aq = ap->next; free(ap); ap = aq;
1633	1650	}
1634	1651	if (i >= max_res)
1635		fprintf(stderr,"*alignment truncated: %d/%d*\n", max_res,i);
	1652	fprintf(stderr,"* error [%s:%d] - alignment truncated: %d >= %d*\n",
	1653	__FILE__, __LINE__, i, max_res);
1636	1654
1637	1655	/* up = &up[-3]; down = &down[-3]; tp = &tp[-3]; */
1638	1656	free(&f_str->up[-3]); free(&f_str->tp[-3]); free(&f_str->down[-3]);

2478	2496
2479	2497	/* now we need alignment storage - get it */
2480	2498	if ((cur_ares->res = (int *)calloc((size_t)max_res,sizeof(int)))==NULL) {
2481		fprintf(stderr," *** cannot allocate alignment results array %d\n",max_res);
	2499	fprintf(stderr,"*** error [%s:%d] - cannot allocate alignment results array %d\n",
	2500	__FILE__, __LINE__, max_res);
2482	2501	exit(1);
2483	2502	}
2484	2503

2649	2668	have_ares = 0x3; / set 0x2 bit to indicate local copy */
2650	2669
2651	2670	if ((a_res = (struct a_res_str *)calloc(1, sizeof(struct a_res_str)))==NULL) {
2652		fprintf(stderr," [do_walign] Cannot allocate a_res");
	2671	fprintf(stderr,"*** error [%s:%d] - cannot allocate a_res [%lu]",
	2672	__FILE__, __LINE__, sizeof(struct a_res_str));
2653	2673	return NULL;
2654	2674	}
2655	2675

2940	2960	update_data_p = init_update_data(show_code);
2941	2961	}
2942	2962	else {
2943		fprintf(stderr,"*** error [%s:%d] --- cal_cons_u() invalid calc_func_mode: %d\n",
	2963	fprintf(stderr,"*** error [%s:%d] --- calc_cons_u() invalid calc_func_mode: %d\n",
2944	2964	__FILE__, __LINE__, calc_func_mode);
2945	2965	exit(1);
2946	2966	}

2972	2992	else if (calc_func_mode == CALC_ID \|\| calc_func_mode == CALC_ID_DOM) {
2973	2993	have_ann = (annotp_p && annotp_p->n_annot > 0);
2974	2994	spa_p = &spa_c;
2975		sp0_p = &sp0_c;
2976		sp1_p = &sp1_c;
2977
2978		sp0a_p = &sp0a_c;
2979		sp1a_p = &sp1a_c;
	2995	sp0_p = &sp1_c;
	2996	sp1_p = &sp0_c;
	2997
	2998	sp0a_p = &sp1a_c;
	2999	sp1a_p = &sp0a_c;
2980	3000	annot_fmt = 3;
2981	3001
2982	3002	/* does not require aa0a/aa1a, only for variants */
2983	3003	}
2984	3004	else if (calc_func_mode == CALC_CODE) {
2985	3005	spa_p = &spa_c;
2986		sp0_p = &sp0_c;
2987		sp1_p = &sp1_c;
2988
2989		sp0a_p = &sp0a_c;
2990		sp1a_p = &sp1a_c;
	3006	sp0_p = &sp1_c;
	3007	sp1_p = &sp0_c;
	3008
	3009	sp0a_p = &sp1a_c;
	3010	sp1a_p = &sp0a_c;
2991	3011
2992	3012	show_code = (display_code & (SHOW_CODE_MASK+SHOW_CODE_EXT)); /* see defs.h; SHOW_CODE_ALIGN=2,_CIGAR=3,_CIGAR_EXT=4 */
2993	3013	annot_fmt = 2;

3001	3021	update_data_p = init_update_data(show_code);
3002	3022	}
3003	3023	else {
3004		fprintf(stderr,"*** error [%s:%d] --- cal_cons_u() invalid calc_func_mode: %d\n",
	3024	fprintf(stderr,"*** error [%s:%d] --- calc_cons_u() invalid calc_func_mode: %d\n",
3005	3025	__FILE__, __LINE__, calc_func_mode);
3006	3026	exit(1);
3007	3027	}

3117	3137	if (cumm_seq_score) *i_spa++ = itmp;
3118	3138
3119	3139	if (calc_func_mode == CALC_CODE) {
	3140	#ifndef TFAST
3120	3141	update_code(align_code_dyn, update_data_p, 3, spa_p, sp0_p, *sp1_p);
	3142	#else
	3143	update_code(align_code_dyn, update_data_p, 3, spa_p, sp1_p, *sp0_p);
	3144	#endif
3121	3145
3122	3146	if (have_ann && have_push_features) {
3123	3147	add_annot_code(have_ann, sp0_p, sp1_p, *sp1a_p,

3159	3183	*spa_p = M_DEL;
3160	3184
3161	3185	if (calc_func_mode == CALC_CODE) {
	3186	#ifndef TFAST
3162	3187	update_code(align_code_dyn, update_data_p, 2, spa_p,sp0_p,*sp1_p);
	3188	#else
	3189	update_code(align_code_dyn, update_data_p, 2, spa_p,sp1_p,*sp0_p);
	3190	#endif
3163	3191	}
3164	3192
3165	3193	if (cumm_seq_score) *i_spa++ = ppst->gshift;

3232	3260	spa_p = align_type(itmp, sp0_p, *sp1_p, 0, aln, ppst->pam_x_id_sim);
3233	3261
3234	3262	if (calc_func_mode == CALC_CODE) {
	3263	#ifndef TFAST
3235	3264	update_code(align_code_dyn, update_data_p, 3, spa_p,sp0_p,*sp1_p);
	3265	#else
	3266	update_code(align_code_dyn, update_data_p, 3, spa_p,sp1_p,*sp0_p);
	3267	#endif
3236	3268	}
3237	3269
3238	3270	d1_alen++;

3279	3311	*spa_p = M_DEL;
3280	3312
3281	3313	if (calc_func_mode == CALC_CODE) {
	3314	#ifndef TFAST
3282	3315	update_code(align_code_dyn, update_data_p, 4, spa_p,sp0_p,*sp1_p);
	3316	#else
	3317	update_code(align_code_dyn, update_data_p, 4, spa_p,sp1_p,*sp0_p);
	3318	#endif
3283	3319	}
3284	3320
3285	3321	if (calc_func_mode == CALC_CONS) {sp0_p++; sp1_p++; spa_p++;}

3344	3380	spa_p = align_type(itmp, sp0_p, *sp1_p, 0, aln, ppst->pam_x_id_sim);
3345	3381
3346	3382	if (calc_func_mode == CALC_CODE) {
	3383	#ifndef TFAST
3347	3384	update_code(align_code_dyn, update_data_p, 3, spa_p,sp0_p,*sp1_p);
	3385	#else
	3386	update_code(align_code_dyn, update_data_p, 3, spa_p,sp1_p,*sp0_p);
	3387	#endif
3348	3388	}
3349	3389
3350	3390	d1_alen++;

3392	3432
3393	3433	if (calc_func_mode == CALC_CODE) {
3394	3434	*spa_p = 5;
	3435	#ifndef TFAST
3395	3436	update_code(align_code_dyn, update_data_p, 5, spa_p,sp0_p,*sp1_p);
	3437	#else
	3438	update_code(align_code_dyn, update_data_p, 5, spa_p,sp1_p,*sp0_p);
	3439	#endif
3396	3440	}
3397	3441
3398	3442	lenc++;

3408	3452
3409	3453	if (calc_func_mode == CALC_CODE) {
3410	3454	spa_p = 5; / indel code */
	3455	#ifndef TFAST
3411	3456	update_code(align_code_dyn, update_data_p, 0, spa_p,sp0_p,*sp1_p);
	3457	#else
	3458	update_code(align_code_dyn, update_data_p, 0, spa_p,sp1_p,*sp0_p);
	3459	#endif
3412	3460	}
3413	3461
3414	3462	if (cumm_seq_score) {

3594	3642	*/
3595	3643
3596	3644	static struct update_code_str *
3597		init_update_data(show_code) {
	3645	init_update_data(int show_code) {
3598	3646
3599	3647	struct update_code_str *update_data_p;
3600	3648

3640	3688
3641	3689	if (!up_dp) return;
3642	3690
3643		if (up_dp->btop_enc) {
3644		sprintf(tmp_cnt,"%d",up_dp->p_op_cnt);
3645		up_dp->p_op_cnt = 0;
3646		}
3647		else {
3648		sprintf_code(tmp_cnt,up_dp, up_dp->p_op_idx, up_dp->p_op_cnt);
3649		}
3650		dyn_strcat(align_code_dyn, tmp_cnt);
	3691	if (up_dp->p_op_cnt) {
	3692	if (up_dp->btop_enc) {
	3693	sprintf(tmp_cnt,"%d",up_dp->p_op_cnt);
	3694	up_dp->p_op_cnt = 0;
	3695	}
	3696	else {
	3697	sprintf_code(tmp_cnt,up_dp, up_dp->p_op_idx, up_dp->p_op_cnt);
	3698	}
	3699	dyn_strcat(align_code_dyn, tmp_cnt);
	3700	}
3651	3701
3652	3702	free(up_dp);
3653	3703	}

3700	3750
3701	3751	/* only aligned identities update counts */
3702	3752	if (op==3 && sim_code == M_IDENT) {
3703		up_dp->p_op_cnt++;
3704		return;
	3753	if ((sp0 == '' && (sp1 == '' \|\| toupper(sp1) == 'U'))
	3754	\|\| (sp1 == '' && (sp0 == '' \|\| toupper(sp0) == 'U'))) {
	3755	if (up_dp->p_op_cnt > 0) {
	3756	sprintf(tmp_str,"%d**",up_dp->p_op_cnt);
	3757	up_dp->p_op_cnt = 0;
	3758	return;
	3759	}
	3760	}
	3761	else {
	3762	up_dp->p_op_cnt++;
	3763	return;
	3764	}
3705	3765	}
3706	3766	else {
3707	3767	if (up_dp->p_op_cnt > 0) {

+32

-27

src/dropnfa.c less more

208	208	if (hsq[i0] < NMAP && hsq[i0] > mhv) mhv = hsq[i0];
209	209
210	210	if (mhv <= 0) {
211		fprintf (stderr, " maximum hsq <=0 %d\n", mhv);
	211	fprintf (stderr, "*** error [%s:%d] maximum hsq <=0 %d\n", __FILE__, __LINE__, mhv);
212	212	exit (1);
213	213	}
214	214

222	222	f_str->hmask = (hmax >> f_str->kshft) - 1;
223	223
224	224	if ((f_str->harr = (int *) calloc (hmax, sizeof (int))) == NULL) {
225		fprintf (stderr, " *** cannot allocate hash array: hmax: %d hmask: %d\n",
226		hmax, f_str->hmask);
	225	fprintf (stderr, "*** error [%s:%d] - cannot allocate hash array: hmax: %d hmask: %d\n",
	226	__FILE__,__LINE__,hmax, f_str->hmask);
227	227	exit (1);
228	228	}
229	229
230	230	if ((f_str->pamh1 = (int *) calloc (nsq+1, sizeof (int))) == NULL) {
231		fprintf (stderr, " *** cannot allocate pamh1 array nsq=%d\n",nsq);
	231	fprintf (stderr, "*** error [%s:%d] - cannot allocate pamh1 array nsq=%d\n",
	232	__FILE__, __LINE__, nsq);
232	233	exit (1);
233	234	}
234	235
235	236	if ((f_str->pamh2 = (int *) calloc (hmax, sizeof (int))) == NULL) {
236		fprintf (stderr, " *** cannot allocate pamh2 array hmax=%d\n",hmax);
	237	fprintf (stderr, "*** error [%s:%d] - cannot allocate pamh2 array hmax=%d\n",
	238	__FILE__, __LINE__,hmax);
237	239	exit (1);
238	240	}
239	241
240	242	if ((f_str->link = (int *) calloc (n0, sizeof (int))) == NULL) {
241		fprintf (stderr, " *** cannot allocate hash link array n0=%d",n0);
	243	fprintf (stderr, "*** error [%s:%d] - cannot allocate hash link array n0=%d",
	244	__FILE__, __LINE__, n0);
242	245	exit (1);
243	246	}
244	247

299	302	f_str->ndo = 0;
300	303	if ((f_str->diag = (struct dstruct *) calloc ((size_t)MAXDIAG,
301	304	sizeof (struct dstruct)))==NULL) {
302		fprintf (stderr," *** cannot allocate diagonal arrays: %lu\n",
303		MAXDIAG *sizeof (struct dstruct));
	305	fprintf (stderr,"*** error [%s:%d] - cannot allocate diagonal arrays: %lu\n",
	306	__FILE__, __LINE__, MAXDIAG *sizeof (struct dstruct));
304	307	exit (1);
305	308	};
306	309

309	312	if ((f_str->aa1x =(unsigned char *)calloc((size_t)ppst->maxlen+2,
310	313	sizeof(unsigned char)))
311	314	== NULL) {
312		fprintf (stderr, " *** cannot allocate aa1x array %d\n", ppst->maxlen+2);
	315	fprintf (stderr, "*** error [%s:%d] - cannot allocate aa1x array %d\n",
	316	__FILE__, __LINE__, ppst->maxlen+2);
313	317	exit (1);
314	318	}
315	319	f_str->aa1x++;

324	328	maxn0 = n0 + 4;
325	329	if ((ss = (struct swstr *) calloc (maxn0, sizeof (struct swstr)))
326	330	== NULL) {
327		fprintf (stderr, " *** cannot allocate ss array %3d\n", n0);
	331	fprintf (stderr, "*** error [%s:%d] - cannot allocate ss array %3d\n",
	332	__FILE__, __LINE__, n0);
328	333	exit (1);
329	334	}
330	335	ss++;

335	340
336	341	/* initialize variable (-S) pam matrix */
337	342	if ((f_str->waa_s= (int )calloc((nsq+1)(n0+1),sizeof(int))) == NULL) {
338		fprintf(stderr,"*** error [%s:%d] cannot allocate waa_s array %3d\n",
	343	fprintf(stderr,"*** error [%s:%d] - cannot allocate waa_s array %3d\n",
339	344	__FILE__, __LINE__, nsq*n0);
340	345	exit(1);
341	346	}
342	347
343	348	/* initialize pam2p[1] pointers */
344	349	if ((f_str->pam2p[1]= (int *)calloc((n0+1),sizeof(int ))) == NULL) {
345		fprintf(stderr,"*** error [%s:%d] cannot allocate pam2p[1] array %3d\n",
	350	fprintf(stderr,"*** error [%s:%d] - cannot allocate pam2p[1] array %3d\n",
346	351	__FILE__, __LINE__, n0);
347	352	exit(1);
348	353	}
349	354
350	355	pam2p = f_str->pam2p[1];
351	356	if ((pam2p[0]=(int )calloc((nsq+1)(n0+1),sizeof(int))) == NULL) {
352		fprintf(stderr,"*** error [%s:%d] cannot allocate pam2p[1][] array %3d\n",
	357	fprintf(stderr,"*** error [%s:%d] - cannot allocate pam2p[1][] array %3d\n",
353	358	__FILE__, __LINE__, nsq*n0);
354	359	exit(1);
355	360	}

360	365
361	366	/* initialize universal (alignment) matrix */
362	367	if ((f_str->waa_a= (int )calloc((nsq+1)(n0+1),sizeof(int))) == NULL) {
363		fprintf(stderr,"*** error [%s:%d] cannot allocate waa_a struct %3d\n",
	368	fprintf(stderr,"*** error [%s:%d] - cannot allocate waa_a struct %3d\n",
364	369	__FILE__, __LINE__, nsq*n0);
365	370	exit(1);
366	371	}
367	372
368	373	/* initialize pam2p[0] pointers */
369	374	if ((f_str->pam2p[0]= (int *)calloc((n0+1),sizeof(int ))) == NULL) {
370		fprintf(stderr,"*** error [%s:%d] cannot allocate pam2p[1] array %3d\n",
	375	fprintf(stderr,"*** error [%s:%d] - cannot allocate pam2p[1] array %3d\n",
371	376	__FILE__, __LINE__, n0);
372	377	exit(1);
373	378	}
374	379
375	380	pam2p = f_str->pam2p[0];
376	381	if ((pam2p[0]=(int )calloc((nsq+1)(n0+1),sizeof(int))) == NULL) {
377		fprintf(stderr,"*** error [%s:%d] cannot allocate pam2p[1][] array %3d\n",
	382	fprintf(stderr,"*** error [%s:%d] - cannot allocate pam2p[1][] array %3d\n",
378	383	__FILE__, __LINE__, nsq*n0);
379	384	exit(1);
380	385	}

527	532	*f_arg = NULL;
528	533	}
529	534	else {
530		fprintf(stderr, "* error [%s:%d] close_work() with NULL f_str *\n",
	535	fprintf(stderr, "* error [%s:%d] - close_work() with NULL f_str *\n",
531	536	__FILE__, __LINE__);
532	537	}
533	538	}

615	620	}
616	621
617	622	if (n0+n1+1 >= MAXDIAG) {
618		fprintf(stderr,"*** error [%s:%d] n0,n1 too large: %d + %d (%d) > %d \n",
	623	fprintf(stderr,"*** error [%s:%d] - n0,n1 too large: %d + %d (%d) > %d \n",
619	624	__FILE__, __LINE__, n0,n1,n0+n1+1,MAXDIAG);
620	625	rst->score[0] = rst->score[1] = rst->score[2] = -1;
621	626	return;

1136	1141
1137	1142	#ifdef DEBUG
1138	1143	if (window > f_str->bss_size) {
1139		fprintf(stderr,"*** error [%s:%d] dropnfa.c:dmatch window [%d] out of range [%d]\n",
	1144	fprintf(stderr,"*** error [%s:%d] - dmatch window [%d] out of range [%d]\n",
1140	1145	__FILE__, __LINE__, window, f_str->bss_size);
1141	1146	window = f_str->bss_size - 4;
1142	1147	}

1204	1209
1205	1210	band = up-low+1;
1206	1211	if (band < 1) {
1207		fprintf(stderr,"*** error [%s:%d] low > up is unacceptable!: M: %d N: %d l/u: %d/%d\n",
	1212	fprintf(stderr,"*** error [%s:%d] - low > up is unacceptable!: M: %d N: %d l/u: %d/%d\n",
1208	1213	__FILE__, __LINE__, M, N, low, up);
1209	1214	return 0;
1210	1215	}

1346	1351
1347	1352	/* now we need alignment storage - get it */
1348	1353	if ((cur_ares->res = (int *)calloc((size_t)max_res,sizeof(int)))==NULL) {
1349		fprintf(stderr,"*** error [%s:%d] cannot allocate alignment results array %d\n",
	1354	fprintf(stderr,"*** error [%s:%d] - cannot allocate alignment results array %d\n",
1350	1355	__FILE__, __LINE__, max_res);
1351	1356	exit(1);
1352	1357	}

1384	1389	local_aa1 = (unsigned char *)aa1;
1385	1390	if (l_min > 0 \|\| l_max < n1 - 1) {
1386	1391	if (l_max - l_min < 0) {
1387		fprintf(stderr,"*** error [%s:%d] l_min: %d > l_max %d\n",__FILE__, __LINE__, l_min,l_max);
	1392	fprintf(stderr,"*** error [%s:%d] - l_min: %d > l_max %d\n",__FILE__, __LINE__, l_min,l_max);
1388	1393	exit(1);
1389	1394	}
1390	1395	if ((local_aa1 = (unsigned char )calloc(l_max - l_min +2,sizeof(unsigned char )))==NULL) {
1391		fprintf(stderr,"*** error [%s:%d] Cannot allocate local_aa1\n",__FILE__, __LINE__);
	1396	fprintf(stderr,"*** error [%s:%d] - cannot allocate local_aa1\n",__FILE__, __LINE__);
1392	1397	exit(1);
1393	1398	}
1394	1399

1564	1569
1565	1570	window = min (n1, ppst->param_u.fa.optwid);
1566	1571	if (window > f_str->bss_size) {
1567		fprintf(stderr,"*** error [%s:%d] walign window [%d] out of range [%d]\n",
	1572	fprintf(stderr,"*** error [%s:%d] - walign window [%d] out of range [%d]\n",
1568	1573	__FILE__, __LINE__, window, f_str->bss_size);
1569	1574	window = f_str->bss_size - 4;
1570	1575	}

1579	1584	a_res->n1 = n1;
1580	1585
1581	1586	if (score <=0) {
1582		fprintf(stderr,"*** [%s:%d] n0/n1: %d/%d hoff: %d window: %d\n",
	1587	fprintf(stderr,"*** [%s:%d] - score <= 0 - n0/n1: %d/%d hoff: %d window: %d\n",
1583	1588	__FILE__, __LINE__, n0, n1, hoff, window);
1584	1589	return 0;
1585	1590	}

2177	2182	have_ares = 0x3; / set 0x2 bit to indicate local copy */
2178	2183
2179	2184	if ((a_res = (struct a_res_str *)calloc(1, sizeof(struct a_res_str)))==NULL) {
2180		fprintf(stderr,"*** error [%s:%d] Cannot allocate a_res", __FILE__, __LINE__);
	2185	fprintf(stderr,"*** error [%s:%d] - cannot allocate a_res", __FILE__, __LINE__);
2181	2186	return NULL;
2182	2187	}
2183	2188

2203	2208
2204	2209	#ifdef DEBUG
2205	2210	if (adler32(1L,aa1,n1) != adler32_crc) {
2206		fprintf(stderr,"*** error [%s:%d] adler32_crc mismatch n1: %d\n",__FILE__, __LINE__, n1);
	2211	fprintf(stderr,"*** error [%s:%d] - adler32_crc mismatch n1: %d\n",__FILE__, __LINE__, n1);
2207	2212	}
2208	2213	#endif
2209	2214

+1

-1

src/dropnnw2.c less more

574	574	* be rerun with 16 bits. If it is more, and we have tried at least
575	575	* 500 sequences, we switch off the 8-bit mode.
576	576	*/
577		if (score == OVERFLOW) {
	577	if (score == OVERFLOW_SCORE) {
578	578	f_str->done_16bit++;
579	579	if(f_str->done_8bit>500 && (3*f_str->done_16bit)>(f_str->done_8bit))
580	580	f_str->try_8bit = 0;

+8

-8

src/faatran.c less more

37	37
38	38	*/
39	39	static
40		char AA1="FFLLSSSSYYCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG";
	40	char AA1="FFLLSSSSYY*CCUWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG";
41	41	/*
42	42	Starts = ---M---------------M---------------M----------------------------
43	43	Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG

415	415	aacmap[ii]= *aasmap++;
416	416	}
417	417
418
419		for (i=0; i<64; i++) {
420		fprintf(stderr,"'%c',",aacmap[i]);
421		if ((i%16)==15) fputc('\n',stderr);
422		}
423		fputc('\n',stderr);
424
	418	if (debug) {
	419	for (i=0; i<64; i++) {
	420	fprintf(stderr,"'%c',",aacmap[i]);
	421	if ((i%16)==15) fputc('\n',stderr);
	422	}
	423	fputc('\n',stderr);
	424	}
425	425	}
426	426	for (i=0; i<64; i++) {
427	427	aamap[i]=aascii[aacmap[i]];

+38

-18

src/initfa.c less more

497	497	char *iprompt2=" database file name: ";
498	498
499	499	#ifdef PCOMPLIB
500		char *verstr="36.3.8g Dec, 2017 MPI";
501		#else
502		char *verstr="36.3.8g Dec, 2017";
	500	char *verstr="36.3.8h Aug, 2019 MPI";
	501	#else
	502	char *verstr="36.3.8h Aug, 2019";
503	503	#endif
504	504
505	505	static int mktup=3;

779	779	ppst->pam2[0][ix_j][p_i] = ppst->pam2[0][ix_i][p_i];
780	780	ppst->pam2[0][p_i][ix_j] = ppst->pam2[0][p_i][ix_i];
781	781	}
782		}
	782	p_i = pascii['*'];
	783	ppst->pam2[0][ix_j][p_i] = ppst->pam2[0][p_i][ix_j] = ppst->pam2[0][p_i][p_i];
	784	}
783	785	else {
784	786	pascii['U'] = pascii['C'];
785	787	pascii['u'] = pascii['c'];

1289	1291	}
1290	1292	}
1291	1293
1292		static char my_opts[] = "1BIM:ox:y:N:";
	1294	/* Extended options:
	1295	-X1 - use the init1 score, rather than initn, for statistics and ordering results
	1296	-Xa - only report annotation information in -m 8CB output (for later merge)
	1297	-Xb - report z-score, not bit-score
	1298	-XB - use blast identities
	1299	-XI - ensure that identities are not rounded to 100%
	1300	-XM: - specify memory limits for database buffering
	1301	-XN:[+S] - treat N:N/X:X as similar as well as identical
	1302	-Xo - use initn score, not opt score, for statistics and ordering results
	1303	-Xx: - penalties for X:X, X:not-X match
	1304	-Xy: - width of band for optimized scores
	1305	*/
	1306
	1307	static char my_opts[] = "1aBbIM:ox:y:N:";
1293	1308
1294	1309	void
1295	1310	parse_ext_opts(char opt_arg, int pgm_id, struct mngmsg m_msp, struct pstruct *ppst) {

1309	1324	ppst->param_u.fa.iniflag=1;
1310	1325	}
1311	1326	break;
1312		case 'B': m_msp->z_bits = 0; break;
	1327
	1328	case 'a': m_msp->m8_show_annot = 1; break;
	1329
	1330	case 'B': m_msp->blast_ident = 1; break;
	1331
	1332	case 'b': m_msp->z_bits = 0; break;
1313	1333	case 'I':
1314	1334	m_msp->tot_ident = 1;
1315	1335	/*

2865	2885
2866	2886	for (i=0; i< ppst->nsq; i++) {
2867	2887	if (ppst->pam2[0][0][i] > -1000) {
2868		fprintf(stderr," * ERROR * pam2[0][0][%d/%c] == %d\n",
2869		i,NCBIstdaa[i],ppst->pam2[0][0][i]);
	2888	fprintf(stderr," * error[%s:%d]* pam2[0][0][%d/%c] == %d\n",
	2889	__FILE__, __LINE__, i,NCBIstdaa[i],ppst->pam2[0][0][i]);
2870	2890	good_params = 0;
2871	2891	}
2872	2892	if (ppst->pam2[0][i][0] > -1000) {
2873		fprintf(stderr," * ERROR * pam2[0][%d/%c][0] == %d\n",
2874		i,NCBIstdaa[i],ppst->pam2[0][i][0]);
	2893	fprintf(stderr," *** error[%s:%d] (validate_params)- pam2[0][%d/%c][0] == %d\n",
	2894	__FILE__,__LINE__,i,NCBIstdaa[i],ppst->pam2[0][i][0]);
2875	2895	good_params = 0;
2876	2896	}
2877	2897	}

2880	2900	if (ppst->ext_sq_set) {
2881	2901	for (i=0; i< ppst->nsqx; i++) {
2882	2902	if (ppst->pam2[1][0][i] > -1000) {
2883		fprintf(stderr," * ERROR * pam2[1][0][%d] == %d\n",
2884		i,ppst->pam2[1][0][i]);
	2903	fprintf(stderr," *** error[%s:%d] (validate_params) - pam2[1][0][%d] == %d\n",
	2904	__FILE__, __LINE__, i,ppst->pam2[1][0][i]);
2885	2905	good_params = 0;
2886	2906	}
2887	2907	if (ppst->pam2[1][i][0] > -1000) {
2888		fprintf(stderr," * ERROR * pam2[1][%d][0] == %d\n",
2889		i,ppst->pam2[1][i][0]);
	2908	fprintf(stderr," *** error[%s:%d] (validate_params) - pam2[1][%d][0] == %d\n",
	2909	__FILE__, __LINE__, i,ppst->pam2[1][i][0]);
2890	2910	good_params = 0;
2891	2911	}
2892	2912	}

2895	2915	/* check for valid residues in query */
2896	2916	for (i=0; i<n0; i++) {
2897	2917	if (aa0[i] > ppst->nsq_e && aa0[i] != ESS) {
2898		fprintf(stderr," * ERROR * aa0[%d] = %c[%d > %d] out of range\n",
2899		i, aa0[i], aa0[i], ppst->nsq_e);
	2918	fprintf(stderr," *** error [%s:%d] (validate_params) - aa0[%d] = %c[%d > %d] out of range\n",
	2919	__FILE__,__LINE__,i, aa0[i], aa0[i], ppst->nsq_e);
2900	2920	good_params = 0;
2901	2921	}
2902	2922	}
2903	2923
2904	2924	for (i=0; i<128; i++) {
2905	2925	if (lascii[i] < NA && lascii[i] > ppst->nsq_e) {
2906		fprintf(stderr," * ERROR * lascii [%c\|%d] = %d > %d out of range\n",
2907		i, i, lascii[i], ppst->nsq_e);
	2926	fprintf(stderr," *** error[%s:%d] (validate_params) - lascii [%c\|%d] = %d > %d out of range\n",
	2927	__FILE__, __LINE__, i, i, lascii[i], ppst->nsq_e);
2908	2928	good_params = 0;
2909	2929	}
2910	2930

+10

-9

src/lib_sel.c less more

72	72	if ((bp=strchr(tname,' '))!=NULL) *bp='\0';
73	73
74	74	if ((tptr=fopen(tname,"r"))==NULL) {
75		fprintf(stderr," could not open file of names: %s\n",tname);
	75	fprintf(stderr,"*** error [%s:%d] could not open file of names: %s\n",__FILE__,__LINE__,tname);
76	76	return NULL;
77	77	}
78	78

108	108	if (strlen(flstr)> (size_t)0) {
109	109	chlen = MAX_CH*MAX_FN;
110	110	if ((chtmp=charr=calloc((size_t)chlen,sizeof(char)))==NULL) {
111		fprintf(stderr,"cannot allocate choice file array\n");
	111	fprintf(stderr,"*** error [%s:%d] cannot allocate choice file array\n",__FILE__,__LINE__);
112	112	goto l1;
113	113	}
114	114	chlen--;
115	115	if ((fch=fopen(flstr,"r"))==NULL) {
116		fprintf(stderr," cannot open choice file: %s\n",flstr);
	116	fprintf(stderr,"*** error [%s:%d] cannot open choice file: %s\n",__FILE__,__LINE__,flstr);
117	117	goto l1;
118	118	}
119	119	fprintf(stderr,"\n Choose sequence library:\n\n");

185	185	int new_abbr,ich, nch; /* use new multi-letter abbr */
186	186	int ltmp;
187	187	FILE *fch;
188		struct lib_struct *cur_lib_p = NULL;
	188	struct lib_struct cur_lib_p = NULL, tmp_lib_p;
189	189
190	190	new_abbr = 0;
191	191	*ltitle = '\0';

195	195	}
196	196	else {
197	197	if (*flstr=='\0') {
198		fprintf(stderr," abbrv. list request but FASTLIBS undefined, cannot use %s\n",lname);
	198	fprintf(stderr,"*** error [%s:%d] abbrv. list request but FASTLIBS undefined, cannot use %s\n",__FILE__,__LINE__,lname);
199	199	exit(1);
200	200	}
201	201

217	217
218	218	if (strlen(flstr) > (size_t)0) {
219	219	if ((fch=fopen(flstr,"r"))==NULL) {
220		fprintf(stderr," cannot open choice file: %s\n",flstr);
	220	fprintf(stderr,"*** error [%s:%d] cannot open choice file: %s\n",__FILE__,__LINE__,flstr);
221	221	return NULL;
222	222	}
223	223	}

232	232
233	233	/* if !new_abbr, match on one letter with ulindex() */
234	234	if (!new_abbr) {
235		if (bp=='+') continue; / not a &lib& */
	235	if (bp=='+') continue; / not a +lib+ */
236	236	else if (ulindex(lname,bp)!=NULL) {
237	237	if (ltitle[0] == '\0') {
238	238	strncpy(ltitle,line,MAX_STR);

242	242	strncat(ltitle,",\n ",MAX_STR-ltmp);
243	243	strncat(ltitle,line,MAX_STR-ltmp-4);
244	244	}
245		cur_lib_p = get_lnames(bp+1, cur_lib_p);
	245	tmp_lib_p = get_lnames(bp+1, cur_lib_p);
	246	if (tmp_lib_p) { cur_lib_p = tmp_lib_p;}
246	247	}
247	248	}
248	249	else {

267	268	}
268	269	*bp1='+';
269	270	}
270		else fprintf(stderr,"%s missing final '+'\n",bp);
	271	else fprintf(stderr,"*** error [%s:%d] %s missing final '+'\n",__FILE__,__LINE__,bp);
271	272	}
272	273	}
273	274	}

+4

-1

src/map_db.c less more

18	18	governing permissions and limitations under the License.
19	19	*/
20	20
21		/* input is a libtype 1,5, or 6 sequence database */
	21	/* input is a lib_type 1,5, or 6 sequence database (lib_type specified after filename),
	22	e.g. 'swissprot.lseg 1' */
	23	/* map_db -n specifies a DNA database */
	24
22	25	/* output is a BLAST2 formatdb type index file */
23	26
24	27	/* format of the index file:

+30

-13

src/mshowalign2.c less more

155	155	int nc, lc, maxc;
156	156	double lzscore, lzscore2, lbits;
157	157	struct a_struct l_aln, *l_aln_p;
158		float percent, gpercent;
	158	float percent, gpercent, ng_percent, disp_percent, disp_similar;
	159	int disp_alen;
159	160	/* strings, lengths for conventional alignment */
160	161	char seqc0, seqc0a, seqc1, seqc1a, *seqca;
161	162	int *cumm_seq_score;

489	490
490	491	if (lc > 0) {
491	492	percent = (100.0*(float)l_aln_p->nident)/(float)lc;
492		}
493		else { percent = -1.00; }
	493	ng_percent = (100.0*(float)l_aln_p->nident)/(float)(lc-(l_aln_p->ngap_q + l_aln_p->ngap_l));
	494	}
	495	else { percent = ng_percent = -1.00; }
494	496
495	497	fprintf (fp, "a {\n");
496	498	if (annot_var_dyn->string[0]) {

533	535
534	536	if (cur_ares_p->score_delta > 0) score_delta -= cur_ares_p->score_delta;
535	537
536		percent = calc_fpercent_id(100.0, l_aln_p->nident,lc,m_msp->tot_ident, -1.0);
	538	disp_percent = percent = calc_fpercent_id(100.0, l_aln_p->nident,lc,m_msp->tot_ident, -1.0);
	539	disp_similar = calc_fpercent_id(100.0, l_aln_p->nsim, lc, m_msp->tot_ident, -1.0);
	540	disp_alen = lc;
537	541
538	542	ngap = l_aln_p->ngap_q + l_aln_p->ngap_l;
	543	ng_percent = calc_fpercent_id(100.0, l_aln_p->nident,lc-ngap,m_msp->tot_ident, -1.0);
	544	if (m_msp->blast_ident) {
	545	disp_percent = ng_percent;
	546	disp_similar = calc_fpercent_id(100.0, l_aln_p->npos, lc-ngap, m_msp->tot_ident, -1.0);
	547	disp_alen = lc - ngap;
	548	}
	549
539	550	#ifndef SHOWSIM
540		gpercent = calc_fpercent_id(100.0,l_aln_p->nident,lc-ngap,m_msp->tot_ident, -1.0);
	551	gpercent = ng_percent;
541	552	#else
542		gpercent = calc_fpercent_id(100.0,l_aln_p->nsim,lc,m_msp->tot_ident, -1.0);
	553	gpercent = disp_similar;
543	554	#endif
544	555
545	556	lsw_score = cur_ares_p->sw_score + score_delta;

663	674	if (m_msp->markx & MX_HTML) {
664	675	fprintf(fp,"<!-- ANNOT_START \"%s\" -->",link_name);}
665	676	/* ensure that last character is "\n" */
666		if (annot_var_dyn->string[strlen(annot_var_dyn->string)-1] != '\n') {
667		annot_var_dyn->string[strlen(annot_var_dyn->string)-1] = '\n';
668		}
669		fputs(annot_var_dyn->string, fp);
	677	if (!m_msp->m8_show_annot) {
	678	if (annot_var_dyn->string[strlen(annot_var_dyn->string)-1] != '\n') {
	679	annot_var_dyn->string[strlen(annot_var_dyn->string)-1] = '\n';
	680	}
	681	fputs(annot_var_dyn->string, fp);
	682	}
	683	else { fputs("\n",fp);}
	684
670	685	if (m_msp->markx & MX_HTML) {fputs("<!-- ANNOT_STOP -->",fp);}
671	686	}
672	687

745	760	do_show(fp, m_msp->n0, bbp->seq->n1, lsw_score, name0, name1, nml,
746	761	link_name,
747	762	m_msp, ppst, seqc0, seqc0a, seqc1, seqc1a, seqca, cumm_seq_score,
748		nc, percent, gpercent, lc, l_aln_p, annot_var_dyn->string,
	763	nc, disp_percent, gpercent, disp_alen, l_aln_p, annot_var_dyn->string,
749	764	m_msp->annot_p, bbp->seq->annot_p);
750	765
751	766	/* display the encoded alignment left over from showbest()*/

808	823	int tmp;
809	824
810	825	if (m_msp->markx & MX_AMAP && (m_msp->markx & MX_ATYPE)==7)
	826	/* show text graphic of alignment (very rarely used) */
811	827	disgraph(fp, n0, n1, percent, score,
812	828	aln->amin0, aln->amin1, aln->amax0, aln->amax1, m_msp->sq0off,
813	829	name0, name1, nml, aln->llen, m_msp->markx);
814	830	else if (m_msp->markx & MX_M10FORM) {
	831	/* old tagged/parse-able format */
815	832	if (ppst->sw_flag && m_msp->arelv>0)
816	833	fprintf(fp,"; %s_score: %d\n",m_msp->f_id1,score);
817	834	fprintf(fp,"; %s_ident: %5.3f\n",m_msp->f_id1,percent/100.0);

826	843	seqc0, seqc0a, seqc1, seqc1a, seqca, cumm_seq_score, nc,
827	844	n0, n1, name0, name1, nml, aln);
828	845	}
829		else {
	846	else { /* all "normal" alignment formats */
830	847	if (!(m_msp->markx & MX_MBLAST)) {
831	848	#ifndef LALIGN
832	849	fprintf(fp,"%s score: %d; ",m_msp->alabel, score);

847	864	annot_var_s, q_annot_p, l_annot_p);
848	865	}
849	866
850		if (m_msp->markx & MX_AMAP && (m_msp->markx & MX_ATYPE)!=7) {
	867	if ((m_msp->markx & MX_AMAP) && ((m_msp->markx & MX_ATYPE)!=MX_ATYPE)) {
851	868	fputc('\n',fp);
852	869	tmp = n0;
853	870

+96

-17

src/mshowbest.c less more

90	90	void w_abort (char p, char p1);
91	91
92	92	extern double zs_to_bit(double, int, int);
	93
	94	void dominfo_to_str(struct dyn_string_str d, struct annot_str annot);
93	95
94	96	/* showbest() shows a list of high scoring sequence descriptions, and
95	97	their rst.scores. If -m 9, then an additional complete set of

136	138	struct rstruct rst;
137	139	int l_score0, ngap;
138	140	double lzscore, lzscore2, lbits;
139		float percent, gpercent, ng_percent;
	141	float percent, gpercent, ng_percent, disp_percent, disp_similar;
	142	int disp_alen;
140	143	struct a_struct *aln_p;
141	144	struct a_res_str *cur_ares_p;
142	145	struct rstruct *rst_p;
143	146	int gi_num;
144	147	char html_pre_E[120], html_post_E[120];
145	148	int have_lalign = 0;
	149	struct dyn_string_str *dominfo_dstr;
146	150
147	151	struct lmf_str *m_fptr;
148	152

241	245	/* display number of hits for -m 8C (Blast Tab-commented format) */
242	246	if (m_msp->markx & MX_M8COMMENT) {
243	247	/* line below copied from BLAST+ output */
244		fprintf(fp,"# Fields: query id, subject id, %% identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score");
	248	if (m_msp->markx & MX_M8_BTAB_LEN) {
	249	fprintf(fp,"# Fields: query id, query length, subject id, subject length, %% identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score");
	250	}
	251	else {
	252	fprintf(fp,"# Fields: query id, subject id, %% identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score");
	253	}
	254
245	255	if (ppst->zsflag > 20) {fprintf(fp,", eval2");}
246	256	if (m_msp->show_code & (SHOW_CODE_ALIGN+SHOW_CODE_CIGAR)) { fprintf(fp,", aln_code");}
247	257	else if ((m_msp->show_code & SHOW_CODE_BTOP)==SHOW_CODE_BTOP) { fprintf(fp,", BTOP");}

328	338	for (ib=istart; ib<istop; ib++) {
329	339	bbp = bptr[ib];
330	340	if (ppst->do_rep) {
331		bbp->repeat_thresh =
332		min(E1_to_s(ppst->e_cut_r, m_msp->n0, bbp->seq->n1,ppst->zdb_size, m_msp->pstat_void),
333		bbp->rst.score[ppst->score_ix]);
	341	if (bbp->rst.escore > ppst->e_cut_r) { /* for poor alignment scores, don't look for more */
	342	bbp->repeat_thresh = bbp->rst.score[ppst->score_ix] * 10;
	343	}
	344	else {
	345	bbp->repeat_thresh =
	346	min(E1_to_s(ppst->e_cut_r, m_msp->n0, bbp->seq->n1,ppst->zdb_size, m_msp->pstat_void),
	347	bbp->rst.score[ppst->score_ix]);
	348	}
334	349	}
335	350
336	351	#ifdef DEBUG

518	533	}
519	534	else if (m_msp->markx & MX_M8OUT) { /* MX_M8OUT -- provide query, library */
520	535	if (first_line) {first_line = 0;}
521		fprintf (fp,"%s\t%s",m_msp->qtitle,bline_p);
	536	if (m_msp->markx & MX_M8_BTAB_LEN) {
	537	fprintf (fp,"%s\t%d\t%s\t%d",m_msp->qtitle,m_msp->n0,bline_p,bbp->seq->n1);
	538	}
	539	else {
	540	fprintf (fp,"%s\t%s",m_msp->qtitle,bline_p);
	541	}
522	542	}
523	543	else if (m_msp->markx & MX_MBLAST2) { /* blast "Sequences producing" */
524	544	if (first_line) {first_line = 0;}

536	556	annot_str_len = cur_ares_p->annot_code_n;
537	557
538	558	ngap = cur_ares_p->aln.ngap_q + cur_ares_p->aln.ngap_l;
539		percent = calc_fpercent_id(100.0,aln_p->nident,aln_p->lc, m_msp->tot_ident, -100.0);
	559	disp_percent = percent = calc_fpercent_id(100.0,aln_p->nident,aln_p->lc, m_msp->tot_ident, -100.0);
540	560	ng_percent = calc_fpercent_id(100.0,aln_p->nident,aln_p->lc-ngap, m_msp->tot_ident, -100.0);
	561	disp_similar = calc_fpercent_id(100.0, cur_ares_p->aln.nsim, aln_p->lc, m_msp->tot_ident, -100.0);
	562	disp_alen = aln_p->lc;
	563	if (m_msp->blast_ident) {
	564	disp_percent = ng_percent;
	565	disp_similar = calc_fpercent_id(100.0, cur_ares_p->aln.npos, aln_p->lc - ngap, m_msp->tot_ident, -100.0);
	566	disp_alen = aln_p->lc - ngap;
	567	}
541	568
542	569	#ifndef SHOWSIM
543		gpercent = calc_fpercent_id(100.0, aln_p->nident, aln_p->lc-ngap, m_msp->tot_ident, -100.0);
	570	gpercent = ng_percent;
544	571	#else
545		gpercent = calc_fpercent_id(100.0, cur_ares_p->aln.nsim, aln_p->lc, m_msp->tot_ident, -100.0);
	572	gpercent = disp_similar;
546	573	#endif /* SHOWSIM */
547	574
548	575	if (m_msp->show_code != SHOW_CODE_ID && m_msp->show_code != SHOW_CODE_IDD) { /* show more complete info than just identity */

563	590	/* sequence coordinate min max min max */
564	591	if (!(m_msp->markx & MX_M8OUT)) {
565	592	fprintf(fp,"\t%5.3f %5.3f %4d %4d %4ld %4ld %4ld %4ld %4ld %4ld %4ld %4ld %3d %3d %3d",
566		percent/100.0,gpercent/100.0,
	593	disp_percent/100.0,gpercent/100.0,
567	594	cur_ares_p->sw_score,
568		aln_p->lc,
	595	disp_alen,
569	596	aln_p->d_start0,aln_p->d_stop0,
570	597	aln_p->q_start_off, aln_p->q_end_off,
571	598	aln_p->d_start1,aln_p->d_stop1,

581	608	}
582	609	else { /* MX_M8OUT -- blast order, tab separated */
583	610	fprintf(fp,"\t%.2f\t%d\t%d\t%d\t%ld\t%ld\t%ld\t%ld\t%.2g\t%.1f",
584		ng_percent,aln_p->lc,aln_p->nmismatch,
	611	ng_percent,aln_p->lc-ngap,aln_p->nmismatch,
585	612	aln_p->ngap_q + aln_p->ngap_l+aln_p->nfs,
586	613	aln_p->d_start0, aln_p->d_stop0,
587	614	aln_p->d_start1, aln_p->d_stop1,
588	615	zs_to_E(lzscore,n1,ppst->dnaseq,ppst->zdb_size,m_msp->db),
589	616	lbits);
	617
590	618	if (ppst->zsflag > 20) {
591	619	fprintf(fp,"\t%.2g",zs_to_E(lzscore2, n1, ppst->dnaseq, ppst->zdb_size, m_msp->db));
592	620	}
593	621	if ((m_msp->show_code & (SHOW_CODE_ALIGN+SHOW_CODE_CIGAR+SHOW_CODE_BTOP)) && seq_code_len > 0 && seq_code != NULL) {
594	622	fprintf(fp,"\t%s",seq_code);
	623
595	624	if (annot_str_len > 0 && annot_str != NULL) {
596	625	fprintf(fp,"\t%s",annot_str);
597	626	}
	627
	628	if (m_msp->show_code & SHOW_CODE_DOMINFO) {
	629	dominfo_dstr = init_dyn_string(1024,1024);
	630	if (m_msp->annot_p) {
	631	dominfo_to_str(dominfo_dstr,m_msp->annot_p);
	632	}
	633	if (bbp->seq->annot_p) {
	634	dominfo_to_str(dominfo_dstr,bbp->seq->annot_p);
	635	}
	636
	637	if (dominfo_dstr->string[0]) {
	638	fprintf(fp,"\t%s",dominfo_dstr->string);
	639	}
	640	free_dyn_string(dominfo_dstr);
	641	}
598	642	}
599	643	fprintf(fp,"\n");
600	644	}

602	646	else { /* !SHOW_CODE -> SHOW_ID or SHOW_IDD*/
603	647	#ifdef SHOWSIM
604	648	fprintf(fp," %5.3f %5.3f %4d",
605		percent/100.0,
606		(float)aln_p->nsim/(float)aln_p->lc,aln_p->lc);
	649	disp_percent/100.0,disp_similar/100.0,disp_alen);
607	650	#else
608		fprintf(fp," %5.3f %4d", percent/100.0,aln_p->lc);
	651	fprintf(fp," %5.3f %4d", disp_percent/100.0,disp_alen);
609	652	#endif
610	653	if (m_msp->markx & MX_HTML) {
611	654	if (cur_ares_p->index > 0) {

619	662	}
620	663	else { link_shown = 0;}
621	664
622		if ((m_msp->show_code & SHOW_CODE_ID) == SHOW_CODE_ID) {
	665	if ((m_msp->show_code & SHOW_CODE_ID) == SHOW_CODE_ID ) {
623	666	annot_str = cur_ares_p->annot_var_id;
624	667	}
625	668	else if ((m_msp->show_code & SHOW_CODE_IDD) == SHOW_CODE_IDD) {

628	671	else {
629	672	annot_str = NULL;
630	673	}
631		if (annot_str && annot_str[0]) {
	674	if (annot_str && annot_str[0] && (!m_msp->m8_show_annot \|\| (m_msp->markx & MX_M8OUT))) {
632	675	fprintf(fp," %s",annot_str);
633	676	}
634	677	}

662	705
663	706	if (m_msp->markx & MX_HTML) fprintf(fp,"</pre><hr>\n");
664	707	}
	708
	709	/* dominfo_to_str() -- convert domain annotations to a \|DX:1-100;C=PF12345~1 dyn_string */
	710	/* used for both query and subject strings */
	711	void
	712	dominfo_to_str(struct dyn_string_str dominfo_dstr, struct annot_str annots) {
	713	int i;
	714	char tmp_string[MAX_STR];
	715	struct annot_entry *annot;
	716	struct dyn_string_str *dyn_dom_str;
	717
	718	for (i=0; i < annots->n_annot; i++) {
	719
	720	annot = &annots->annot_arr_p[i];
	721
	722	if (annot->target) {
	723	if (annot->label == '-') {
	724	sprintf(tmp_string,"\|XD:%ld-%ld;C=%s",annot->pos+1,annot->end+1,annot->comment);
	725	}
	726	else {
	727	sprintf(tmp_string,"\|X%c:%ld-%ld;C=%s",annot->label, annot->pos+1,annot->end+1,annot->comment);
	728	}
	729	}
	730	else {
	731	if (annot->label == '-') {
	732	sprintf(tmp_string,"\|DX:%ld-%ld;C=%s",annot->pos+1,annot->end+1,annot->comment);
	733	}
	734	else {
	735	sprintf(tmp_string,"\|%cX:%ld-%ld;C=%s",annot->label, annot->pos+1,annot->end+1,annot->comment);
	736	}
	737
	738	}
	739
	740
	741	dyn_strcat(dominfo_dstr, tmp_string);
	742	}
	743	}

+1

-0

src/ncbl2_head.h less more

23	23
24	24	#define FORMATDBV3 3 /* formatdb version */
25	25	#define FORMATDBV4 4 /* formatdb version */
	26	#define FORMATDBV5 5 /* formatdb version */
26	27
27	28	#define NULLB '\0' /* sentinel byte */
28	29

+19

-2

src/ncbl2_mlib.c less more

79	79
80	80
81	81	/* ****************************************************************
82		This code reads NCBI Blast2 format databases from formatdb version 3 and 4
	82	This code reads NCBI Blast2 format databases from formatdb version 3 -- 5
83	83
84	84	(From NCBI) This section describes the format of the databases.
85	85

449	449	src_uint4_read(ifile,(unsigned )&dbformat); / get format DB version number */
450	450	src_uint4_read(ifile,(unsigned )&dbtype); / get 1 for protein/0 DNA */
451	451
452		if (dbformat != FORMATDBV3 && dbformat!=FORMATDBV4) {
	452	if (dbformat != FORMATDBV3 && dbformat!=FORMATDBV4 && dbformat!=FORMATDBV5) {
453	453	fprintf(stderr,"error - %s wrong formatdb version (%d/%d)\n",
454	454	tname,dbformat,FORMATDBV3);
455	455	return NULL;

787	787	int title_len;
788	788	char *title_str=NULL;
789	789	int date_len;
	790	char *pdb_title_str=NULL;
	791	int pdb_title_len;
790	792	char *date_str=NULL;
791	793	long ltmp;
792	794	int64_t l8tmp;
793	795	int i, tmp;
794	796	unsigned int *f_pos_arr;
795	797
	798	if (dbformat == FORMATDBV5) {
	799	src_uint4_read(ifile,(unsigned int *)&ltmp);
	800	}
	801
796	802	src_uint4_read(ifile,(unsigned *)&title_len);
797	803
798	804	if (title_len > 0) {

803	809	fread(title_str,(size_t)1,(size_t)title_len,ifile);
804	810	}
805	811
	812	if (dbformat == FORMATDBV5) {
	813	src_uint4_read(ifile,(unsigned int *)&pdb_title_len);
	814	if (pdb_title_len > 0) {
	815	if ((pdb_title_str = calloc((size_t)pdb_title_len+1,sizeof(char)))==NULL) {
	816	fprintf(stderr," cannot allocate pdb_title string (%d)\n",pdb_title_len);
	817	goto error_r;
	818	}
	819	fread(pdb_title_str,(size_t)1,(size_t)pdb_title_len,ifile);
	820	}
	821	}
	822
806	823	src_uint4_read(ifile,(unsigned *)&date_len);
807	824
808	825	if (date_len > 0) {

+56

-33

src/nmgetlib.c less more

52	52	4 - Intelligentics format
53	53	5 - NBRF/PIR VMS format
54	54	6 - GCG 2bit format
	55	7 - FASTQ format
	56	8 - accession script
55	57
56	58	10 - list of gi/acc's
57	59	11 - NCBI setdb/blastp (1.3.2) AA/NT
58	60	12 - NCBI setdb/blastp (2.0) AA/NT
59	61	16 - mySQL queries
60
	62
61	63	see file altlib.h to confirm numbers
62	64
63	65	*/

166	168	struct lmf_str *m_fptr=NULL;
167	169	int acc_off=0;
168	170	char fmt_term;
	171	char acc_script[MAX_LSTR];
169	172	struct lib_struct next_lib_p, this_lib_p, *tmp_lib_p;
170	173
171	174	om_fptr = lib_p->m_file_p;

177	180
178	181	wcnt = 0; /* number of times to ask for file name */
179	182
	183	/* check for library type */
	184	lib_type=0;
	185	if ((bp=strchr(lib_p->file_name,' '))!=NULL
	186	\|\| (bp=strchr(lib_p->file_name,'^'))!=NULL) {
	187	if (isdigit((int)(bp+1)[0])) { /* check for number for lib_type */
	188	*bp='\0';
	189	sscanf(bp+1,"%d",&lib_type);
	190	if (lib_type<0 \|\| lib_type >= LASTLIB) {
	191	fprintf(stderr,"\n invalid library type: %d (>%d)- resetting\n%s\n",
	192	lib_type,LASTLIB,lib_p->file_name);
	193	lib_type=0;
	194	}
	195	} /* don't change lib_type if its not a number */
	196	}
	197	else if (lib_p->file_name[0] =='!') { /* check for script */
	198	lib_type = lib_p->lib_type = ACC_SCRIPT;
	199	}
	200
	201	/* check for stdin indicator '-' or '@' (or ACC_SCRIPT) */
	202	if (lib_p->file_name[0] == '-' \|\| lib_p->file_name[0] == '@'
	203	\|\| lib_type == ACC_SCRIPT) {
	204	use_stdin = 1;
	205	}
	206	else use_stdin=0;
	207
	208	if (use_stdin && !(lib_type ==0 \|\| lib_type==ACC_SCRIPT)) {
	209	fprintf(stderr,"\n @/- STDIN libraries must be in FASTA format\n");
	210	return NULL;
	211	}
	212
	213	opt_text[0]='\0';
	214	if (lib_type != ACC_SCRIPT) {
180	215	/* check to see if there is a file option ":1-100" */
181	216	#ifndef WIN32
182		if ((bp=strchr(lib_p->file_name,':'))!=NULL && *(bp+1)!='\0') {
	217	if ((bp=strchr(lib_p->file_name,':'))!=NULL && *(bp+1)!='\0') {
183	218	#else
184		if ((bp=strchr(lib_p->file_name+3,':'))!=NULL && *(bp+1)!='\0') {
	219	if ((bp=strchr(lib_p->file_name+3,':'))!=NULL && *(bp+1)!='\0') {
185	220	#endif
186		strncpy(opt_text,bp+1,sizeof(opt_text));
187		opt_text[sizeof(opt_text)-1]='\0';
188		*bp = '\0';
189		}
190		else opt_text[0]='\0';
191
192		if (lib_p->file_name[0] == '-' \|\| lib_p->file_name[0] == '@') {
193		use_stdin = 1;
194		}
195		else use_stdin=0;
196
197		/* check for library type */
198		if ((bp=strchr(lib_p->file_name,' '))!=NULL) {
199		*bp='\0';
200		sscanf(bp+1,"%d",&lib_type);
201		if (lib_type<0 \|\| lib_type >= LASTLIB) {
202		fprintf(stderr,"\n invalid library type: %d (>%d)- resetting\n%s\n",
203		lib_type,LASTLIB,lib_p->file_name);
204		lib_type=0;
205		}
206		else {
207		lib_p->lib_type = lib_type;
208		}
209		}
210		else lib_type = lib_p->lib_type;
211
212		if (use_stdin && lib_type !=0 ) {
213		fprintf(stderr,"\n @/- STDIN libraries must be in FASTA format\n");
214		return NULL;
	221	strncpy(opt_text,bp+1,sizeof(opt_text));
	222	opt_text[sizeof(opt_text)-1]='\0';
	223	*bp = '\0';
	224	}
215	225	}
216	226
217	227	/* check to see if file can be open()ed? */
218
219	228	l1:
220	229	opnflg = 0;
221	230	if (lib_type<=LASTTXT) {
222	231	if (!use_stdin) {
223	232	opnflg=((libf=fopen(lib_p->file_name,RBSTR))!=NULL);
	233	}
	234	else if (lib_type==ACC_SCRIPT) {
	235	bp = lib_p->file_name;
	236	if (lib_p->file_name[0] == '!') { bp += 1;}
	237	strncpy(acc_script, bp, sizeof(acc_script)-1);
	238	acc_script[sizeof(acc_script)-1] = '\0';
	239
	240	/* convert '+' in annot_script to ' ' */
	241	bp = strchr(acc_script,'+');
	242	for ( ; bp; bp=strchr(bp+1,'+')) {
	243	*bp=' ';
	244	}
	245	libf=popen(acc_script,"r");
	246	opnflg=1;
224	247	}
225	248	else {
226	249	libf=stdin;

+2

-2

src/scaleswn.c less more

759	759
760	760	for (i=1; parm[i].gap > 0; i++) {
761	761	if (parm[i].gap > gap) continue;
762		else if (parm[i].gap == gap && parm[i].ext > ext ) continue;
763		else if (parm[i].gap == gap && parm[i].ext == ext) {
	762	else if (parm[i].gap <= gap && parm[i].ext > ext ) continue;
	763	else if (parm[i].gap <= gap && parm[i].ext <= ext) {
764	764	*K = parm[i].K;
765	765	*Lambda = parm[i].Lambda;
766	766	*H = parm[i].H;

+2

-0

src/structs.h less more

123	123	char sqnam[4]; /* "aa" or "nt" */
124	124	char sqtype[10]; /* "DNA" or "protein" */
125	125	int long_info; /* long description flag*/
	126	int blast_ident; /* calculate identities excluding gaps */
126	127	long sq0off, sq1off; /* virtual offset into aa0, aa1 */
127	128	int markx; /* alignment display type */
128	129	int tot_markx; /* markx as summ of all alternative markx */

156	157	int ashow_set; /* ashow set with -d */
157	158	int nmlen; /* length of name label */
158	159	int show_code; /* show alignment code in -m 9; ==1 => identity only, ==2 alignment code*/
	160	int m8_show_annot; /* show annotations only in -m 8CB output */
159	161	int tot_show_code; /* show alignment for all outputs */
160	162	int pre_load_done; /* set after pre_load_best() call */
161	163	int align_done; /* do_walign() called */

+15

-6

src/upam.h less more

202	202	-5, -11, -11, -11, -6, -9, -9, -12, -10, -1, -5, -9, -5, -8, -10, -10, -6, -17, -9, 8,
203	203	-8, -11, 3, 2, -14, -6, -5, -7, -5, -13, -15, -6, -10, -16, -9, -5, -6, -12, -12, -11, 8,
204	204	-7, -9, -6, -4, -17, 3, 2, -9, -6, -12, -9, -4, -8, -14, -7, -6, -7, -19, -12, -9, -4, 8,
205		-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1
	205	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
	206	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 8
206	207	};
207	208
208	209	/*

240	241	-3, -9, -9, -9, -4, -7, -7, -10, -8, 1, -3, -8, -3, -6, -8, -8, -4, -13, -7, 7,
241	242	-6, -8, 3, 3, -11, -4, -3, -5, -4, -11, -12, -4, -8, -13, -7, -3, -5, -10, -10, -9, 8,
242	243	-5, -6, -4, -3, -13, 3, 3, -7, -4, -10, -7, -2, -6, -11, -5, -4, -5, -15, -9, -7, -2, 7,
243		-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1
	244	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
	245	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 8
	246
244	247	};
245	248
246	249	/*

278	281	-1, -7, -7, -7, -2, -5, -6, -8, -6, 3, -1, -6, -1, -4, -6, -6, -2, -10, -5, 7,
279	282	-4, -5, 4, 3, -8, -2, -1, -3, -2, -8, -9, -2, -6, -10, -5, -2, -3, -8, -7, -7, 7,
280	283	-3, -4, -2, -1, -10, 4, 3, -5, -2, -7, -6, -1, -4, -9, -4, -3, -3, -12, -7, -5, 0, 7,
281		-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1
	284	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
	285	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 8
	286
282	287	};
283	288
284	289	/*

316	321	0, -4, -5, -5, -1, -4, -4, -6, -4, 3, 0, -4, 0, -2, -4, -4, -1, -6, -4, 6,
317	322	-2, -3, 4, 4, -5, -1, 0, -1, 0, -6, -6, -1, -4, -7, -3, 0, -1, -6, -5, -5, 7,
318	323	-2, -1, -1, 0, -6, 4, 3, -3, -1, -5, -4, 0, -3, -6, -2, -1, -2, -8, -5, -4, 0, 6,
319		-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1
	324	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
	325	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 8
	326
320	327	};
321	328
322	329	/*

354	361	0, -3, -4, -4, 0, -3, -3, -4, -3, 3, 1, -3, 1, -1, -3, -3, 0, -4, -2, 5,
355	362	-1, -2, 4, 4, -4, 0, 1, -1, 0, -4, -5, 0, -3, -5, -2, 0, 0, -5, -3, -4, 6,
356	363	-1, 0, 0, 0, -5, 3, 3, -2, 0, -4, -3, 1, -2, -4, -1, -1, -1, -6, -3, -3, 0, 5,
357		-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1
	364	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
	365	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 6
358	366	};
359	367
360	368	/*

432	440	0, -3, -3, -4, 1, -2, -3, -4, -3, 4, 2, -3, 2, -1, -3, -2, 0, -4, -2, 4,
433	441	-1, -1, 4, 4, -3, 1, 2, 0, 0, -4, -4, 0, -3, -5, -1, 0, 0, -5, -3, -3, 6,
434	442	-1, 0, 1, 2, -3, 3, 3, -1, 1, -3, -3, 1, -2, -4, -1, 0, 0, -6, -3, -2, 2, 5,
435		-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1
	443	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
	444	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 6
436	445	};
437	446
438	447	/*

+1

-1

src/url_subs.c less more

317	317	char line[MAX_STR];
318	318	int i, i_doms, n_domain_s = MAX_LSTR;
319	319
320		/* since (currently) annot_var_s is MAX_LSOTR, do the same for domain_s */
	320	/* since (currently) annot_var_s is MAX_LSTR, do the same for domain_s */
321	321	if ((domain_s = (char *)calloc(n_domain_s, sizeof(char)))==NULL) {
322	322	fprintf(stderr,"* error [%s:%d] * cannot allocate domain_s[%d]\n",__FILE__, __LINE__,n_domain_s);
323	323	return NULL;

+8

-4

src/wm_align.c less more

172	172
173	173	/* now we need alignment storage - get it */
174	174	if ((cur_ares->res = (int *)calloc((size_t)max_res,sizeof(int)))==NULL) {
175		fprintf(stderr," *** cannot allocate alignment results array %d\n",max_res);
	175	fprintf(stderr,"*** error [%s:%d] - cannot allocate alignment results array %d\n",
	176	__FILE__, __LINE__, max_res);
176	177	exit(1);
177	178	}
178	179

485	486
486	487	if ((f_ss = (struct swstr *) calloc (N+2, sizeof (struct swstr)))
487	488	== NULL) {
488		fprintf (stderr, " *** cannot allocate f_ss array %3d\n", N+2);
	489	fprintf (stderr, "*** error [%s:%d] - cannot allocate f_ss array %3d\n",
	490	__FILE__, __LINE__, N+2);
489	491	exit (1);
490	492	}
491	493	f_ss++;
492	494
493	495	if ((r_ss = (struct swstr *) calloc (N+2, sizeof (struct swstr)))
494	496	== NULL) {
495		fprintf (stderr, " *** cannot allocate r_ss array %3d\n", N+2);
	497	fprintf (stderr, "*** error [%s:%d] - cannot allocate r_ss array %3d\n",
	498	__FILE__, __LINE__, N+2);
496	499	exit (1);
497	500	}
498	501	r_ss++;

502	505
503	506	ck = CHECK_SCORE(IW,B,M,N,S,W,G,H,NC, &sw);
504	507	if (c != ck) {
505		fprintf(stderr," * Check_score error. %d != %d *\n",c,ck);
	508	fprintf(stderr,"* error [%s:%d] - check_score error. %d != %d *\n",
	509	__FILE__, __LINE__, c,ck);
506	510	}
507	511
508	512	f_ss--; r_ss--;

+25

-22

test/test.sh less more

5	5	if [ ! -d results ]; then
6	6	mkdir results
7	7	fi
	8
	9	export FA_DB=/slib2/fa_dbs/qfo20.lseg
	10
8	11	echo "starting fasta36 - protein" `date`
9		../bin/fasta36 -q -m 6 -Z 100000 ../seq/mgstm1.aa:1-100 q > results/test_m1.ok2.html
10		../bin/fasta36 -S -q -z 11 -O results/test_m1.ok2_p25 -s P250 ../seq/mgstm1.aa:100-218 q
	12	../bin/fasta36 -q -m 6 -Z 100000 ../seq/mgstm1.aa:1-100 $FA_DB > results/test_m1.ok2.html
	13	../bin/fasta36 -S -q -z 11 -O results/test_m1.ok2_p25 -s P250 ../seq/mgstm1.aa:100-218 $FA_DB
11	14	echo "done"
12	15	echo "starting fastxy36" `date`
13		../bin/fastx36 -m 9c -S -q ../seq/mgtt2_x.seq q 1 > results/test_t2.xk1
14		../bin/fasty36 -S -q ../seq/mgtt2_x.seq q > results/test_t2.yk2
15		../bin/fastx36 -m 9c -S -q -z 2 ../seq/mgstm1.esq a > results/test_m1.xk2z2
16		../bin/fasty36 -S -q -z 2 ../seq/mgstm1.esq a > results/test_m1.yk2z2
	16	../bin/fastx36 -m 9c -S -q ../seq/mgtt2_x.seq $FA_DB 1 > results/test_t2.xk1
	17	../bin/fasty36 -S -q ../seq/mgtt2_x.seq $FA_DB > results/test_t2.yk2
	18	../bin/fastx36 -m 9c -S -q -z 2 ../seq/mgstm1.esq $FA_DB > results/test_m1.xk2z2
	19	../bin/fasty36 -S -q -z 2 ../seq/mgstm1.esq $FA_DB > results/test_m1.yk2z2
17	20	echo "done"
18	21	echo "starting fastxy36 rev" `date`
19		../bin/fastx36 -m 9c -q -m 5 ../seq/mgstm1.rev q > results/test_m1.xk2r
20		../bin/fasty36 -q -m 5 -M 200-300 -z 2 ../seq/mgstm1.rev q > results/test_m1.yk2rz2
21		../bin/fasty36 -q -m 5 -z 11 ../seq/mgstm1.rev q > results/test_m1.yk2rz11
	22	../bin/fastx36 -m 9c -q -m 5 ../seq/mgstm1.rev $FA_DB > results/test_m1.xk2r
	23	../bin/fasty36 -q -m 5 -M 200-300 -z 2 ../seq/mgstm1.rev $FA_DB > results/test_m1.yk2rz2
	24	../bin/fasty36 -q -m 5 -z 11 ../seq/mgstm1.rev $FA_DB > results/test_m1.yk2rz11
22	25	echo "done"
23	26	echo "starting ssearch36" `date`
24		../bin/ssearch36 -m 9c -S -z 3 -q ../seq/mgstm1.aa q > results/test_m1.ssz3
25		../bin/ssearch36 -q -M 200-300 -z 2 -Z 100000 -s P250 ../seq/mgstm1.aa q > results/test_m1.ss_p25
	27	../bin/ssearch36 -m 9c -S -z 3 -q ../seq/mgstm1.aa $FA_DB > results/test_m1.ssz3
	28	../bin/ssearch36 -q -M 200-300 -z 2 -Z 100000 -s P250 ../seq/mgstm1.aa $FA_DB > results/test_m1.ss_p25
26	29	echo "done"
27	30	if [ -e ../bin/ssearch36s ]; then
28	31	echo "starting ssearch36s" `date`
29		../bin/ssearch36s -m 9c -S -z 3 -q ../seq/mgstm1.aa q > results/test_m1.sssz3
30		../bin/ssearch36s -q -M 200-300 -z 2 -Z 100000 -s P250 ../seq/mgstm1.aa q > results/test_m1.sss_p25
	32	../bin/ssearch36s -m 9c -S -z 3 -q ../seq/mgstm1.aa $FA_DB > results/test_m1.sssz3
	33	../bin/ssearch36s -q -M 200-300 -z 2 -Z 100000 -s P250 ../seq/mgstm1.aa $FA_DB > results/test_m1.sss_p25
31	34	echo "done"
32	35	fi
33	36	echo "starting prss36(ssearch/fastx)" `date`

35	38	../bin/fastx36 -q -k 1000 ../seq/mgstm1.esq ../seq/xurt8c.aa > results/test_m1.rfx
36	39	echo "done"
37	40	echo "starting ggsearch36/glsearch36" `date`
38		../bin/ggsearch36 -q -m 9i -w 80 ../seq/hahu.aa q > results/test_h1.gg
39		../bin/glsearch36 -q -m 9i -w 80 ../seq/hahu.aa q > results/test_h1.gl
40		../bin/ggsearch36 -q ../seq/gtt1_drome.aa q > results/test_t1.gg
41		../bin/glsearch36 -q ../seq/gtt1_drome.aa q > results/test_t1.gl
	41	../bin/ggsearch36 -q -m 9i -w 80 ../seq/hahu.aa $FA_DB > results/test_h1.gg
	42	../bin/glsearch36 -q -m 9i -w 80 ../seq/hahu.aa $FA_DB > results/test_h1.gl
	43	../bin/ggsearch36 -q ../seq/gtt1_drome.aa $FA_DB > results/test_t1.gg
	44	../bin/glsearch36 -q ../seq/gtt1_drome.aa $FA_DB > results/test_t1.gl
42	45	echo "done"
43	46	echo "starting fasta36 - DNA" `date`
44	47	../bin/fasta36 -S -q ../seq/mgstm1.nt %RMB 4 > results/test_m1.ok4

52	55	../bin/tfasty36 -q -i -3 -N 5000 ../seq/mgstm1.aa %p > results/test_m1.ty2
53	56	echo "done"
54	57	echo "starting fastf36" `date`
55		../bin/fastf36 -q ../seq/m1r.aa q > results/test_mf.ff
56		../bin/fastf36 -q ../seq/m1r.aa q > results/test_mf.ff_s
	58	../bin/fastf36 -q ../seq/m1r.aa $FA_DB > results/test_mf.ff
	59	../bin/fastf36 -q ../seq/m1r.aa $FA_DB > results/test_mf.ff_s
57	60	echo "done"
58	61	echo "starting tfastf36" `date`
59	62	../bin/tfastf36 -q ../seq/m1r.aa %r > results/test_mf.tfr
60	63	echo "done"
61	64	echo "starting fasts36" `date`
62		../bin/fasts36 -q -V '*?@' ../seq/ngts.aa q > results/test_m1.fs1
63		../bin/fasts36 -q ../seq/ngt.aa q > results/test_m1.fs
	65	../bin/fasts36 -q -V '*?@' ../seq/ngts.aa $FA_DB > results/test_m1.fs1
	66	../bin/fasts36 -q ../seq/ngt.aa $FA_DB > results/test_m1.fs
64	67	../bin/fasts36 -q -n ../seq/mgstm1.nts m > results/test_m1.nfs
65	68	echo "starting fastm36" `date`
66		../bin/fastm36 -q ../seq/ngts.aa q > results/test_m1.fm
	69	../bin/fastm36 -q ../seq/ngts.aa $FA_DB > results/test_m1.fm
67	70	../bin/fastm36 -q -n ../seq/mgstm1.nts m > results/test_m1.nfm
68	71	echo "done"
69	72	echo "starting tfasts36" `date`

+15

-12

test/test2V.sh less more

3	3	echo `uname -a`
4	4	echo ""
5	5	echo "starting fasta36 - protein" `date`
	6
	7	FA_DB=/slib2/fa_dbs/qfo20.lseg
	8
6	9	if [ ! -d results ]; then
7	10	mkdir results
8	11	fi
9		../bin/fasta36 -V q\!../scripts/ann_feats_up_www2.pl -V \!../scripts/ann_feats_up_www2.pl -S -z 21 -s BP62 ../seq/gstm1_human.vaa q > results/test2V_m1.ok2_bp62
10		../bin/fasta36 -V q\!../scripts/ann_feats_up_www2.pl -V \!../scripts/ann_feats_up_www2.pl -S -z 21 ../seq/gstm1_human.vaa q > results/test2V_m1.ok2_z21
11		../bin/fasta36 -V q\!../scripts/ann_feats_up_www2.pl -V \!../scripts/ann_feats_up_www2.pl -S -m BB ../seq/gstm1_human.vaa q > results/test2V_m1.ok2mB
	12	../bin/fasta36 -V q\!../scripts/ann_feats_up_www2.pl -V \!../scripts/ann_feats_up_www2.pl -S -z 21 -s BP62 ../seq/gstm1_human.vaa $FA_DB > results/test2V_m1.ok2_bp62
	13	../bin/fasta36 -V q\!../scripts/ann_feats_up_www2.pl -V \!../scripts/ann_feats_up_www2.pl -S -z 21 ../seq/gstm1_human.vaa $FA_DB > results/test2V_m1.ok2_z21
	14	../bin/fasta36 -V q\!../scripts/ann_feats_up_www2.pl -V \!../scripts/ann_feats_up_www2.pl -S -m BB ../seq/gstm1_human.vaa $FA_DB > results/test2V_m1.ok2mB
12	15	echo "done"
13	16	echo "starting fastxy36" `date`
14		../bin/fastx36 -V \!../scripts/ann_feats_up_www2.pl -m 9c -S -q ../seq/mgtt2_x.seq q > results/test2V_t2.xk2m9c
15		../bin/fastx36 -V \!../scripts/ann_feats_up_www2.pl -m BB -S -q ../seq/mgtt2_x.seq q > results/test2V_t2.xk2mB
16		../bin/fastx36 -V \!../scripts/ann_feats_up_www2.pl -m 9c -S -q -z 22 ../seq/gstm1b_human.nt q > results/test2V_m1.xk2m9cz22
17		../bin/fasty36 -V \!../scripts/ann_feats_up_www2.pl -S -q -z 21 ../seq/gstm1b_human.nt q > results/test2V_m1.yk2z21
	17	../bin/fastx36 -V \!../scripts/ann_feats_up_www2.pl -m 9c -S -q ../seq/mgtt2_x.seq $FA_DB > results/test2V_t2.xk2m9c
	18	../bin/fastx36 -V \!../scripts/ann_feats_up_www2.pl -m BB -S -q ../seq/mgtt2_x.seq $FA_DB > results/test2V_t2.xk2mB
	19	../bin/fastx36 -V \!../scripts/ann_feats_up_www2.pl -m 9c -S -q -z 22 ../seq/gstm1b_human.nt $FA_DB > results/test2V_m1.xk2m9cz22
	20	../bin/fasty36 -V \!../scripts/ann_feats_up_www2.pl -S -q -z 21 ../seq/gstm1b_human.nt $FA_DB > results/test2V_m1.yk2z21
18	21	echo "done"
19	22	echo "starting ssearch36" `date`
20		../bin/ssearch36 -V q\!../scripts/ann_pfam_www.pl -V \!../scripts/ann_pfam_www.pl -m 9c -S -z 22 -q ../seq/gstm1_human.vaa q > results/test2V_m1.ssm9cz22
21		../bin/ssearch36 -V q\!../scripts/ann_pfam_www.pl -V \!../scripts/ann_pfam_www.pl -m 9C -S -z 21 -q ../seq/gstm1_human.vaa q > results/test2V_m1.ssm9Cz21
22		../bin/ssearch36 -V q\!../scripts/ann_pfam_www.pl -V \!../scripts/ann_pfam_www.pl -m 8CC -S -q ../seq/gstm1_human.vaa q > results/test2V_m1.ssm8CC
	23	../bin/ssearch36 -V q\!../scripts/ann_pfam_www.pl -V \!../scripts/ann_pfam_www.pl -m 9c -S -z 22 -q ../seq/gstm1_human.vaa $FA_DB > results/test2V_m1.ssm9cz22
	24	../bin/ssearch36 -V q\!../scripts/ann_pfam_www.pl -V \!../scripts/ann_pfam_www.pl -m 9C -S -z 21 -q ../seq/gstm1_human.vaa $FA_DB > results/test2V_m1.ssm9Cz21
	25	../bin/ssearch36 -V q\!../scripts/ann_pfam_www.pl -V \!../scripts/ann_pfam_www.pl -m 8CC -S -q ../seq/gstm1_human.vaa $FA_DB > results/test2V_m1.ssm8CC
23	26	echo "done" `date`
24	27	echo "starting ssearch36" `date`
25		../bin/ggsearch36 -V q\!../scripts/ann_feats_up_www2.pl -V \!../scripts/ann_feats_up_www2.pl -m 9c -S -q ../seq/gstm1_human.vaa q > results/test2V_m1.ggm9c
26		../bin/ggsearch36 -V q\!../scripts/ann_feats_up_www2.pl -V \!../scripts/ann_feats_up_www2.pl -m 9C -S -z 21 -q ../seq/gstm1_human.vaa q > results/test2V_m1.ggm9Cz21
	28	../bin/ggsearch36 -V q\!../scripts/ann_feats_up_www2.pl -V \!../scripts/ann_feats_up_www2.pl -m 9c -S -q ../seq/gstm1_human.vaa $FA_DB > results/test2V_m1.ggm9c
	29	../bin/ggsearch36 -V q\!../scripts/ann_feats_up_www2.pl -V \!../scripts/ann_feats_up_www2.pl -m 9C -S -z 21 -q ../seq/gstm1_human.vaa $FA_DB > results/test2V_m1.ggm9Cz21
27	30	echo "done" `date`