Codebase list fastaq / 3c2f04e2-cb86-43c7-9993-d7882c0899a7/main
New upstream snapshot. Debian Janitor 2 years ago
37 changed file(s) with 376 addition(s) and 1130 deletion(s). Raw diff Collapse all Expand all
+0
-35
.gitignore less more
0 *.py[cod]
1
2 # C extensions
3 *.so
4
5 # Packages
6 *.egg
7 *.egg-info
8 dist
9 build
10 eggs
11 parts
12 bin
13 var
14 sdist
15 develop-eggs
16 .installed.cfg
17 lib
18 lib64
19
20 # Installer logs
21 pip-log.txt
22
23 # Unit test / coverage reports
24 .coverage
25 .tox
26 nosetests.xml
27
28 # Translations
29 *.mo
30
31 # Mr Developer
32 .mr.developer.cfg
33 .project
34 .pydevproject
+0
-6
.travis.yml less more
0 language: python
1 python:
2 - "3.4"
3 sudo: false
4 script:
5 - "python setup.py test"
0 Metadata-Version: 2.1
1 Name: pyfastaq
2 Version: 3.17.0
3 Summary: Script to manipulate FASTA and FASTQ files, plus API for developers
4 Home-page: https://github.com/sanger-pathogens/Fastaq
5 Author: Martin Hunt
6 Author-email: path-help@sanger.ac.uk
7 License: GPLv3
8 Platform: UNKNOWN
9 Classifier: Development Status :: 4 - Beta
10 Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
11 Classifier: Programming Language :: Python :: 3 :: Only
12 Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
13 License-File: LICENSE
14 License-File: AUTHORS
15
16 UNKNOWN
17
0 Fastaq
1 ======
0 # Fastaq
1 Manipulate FASTA and FASTQ files
22
3 [![Build Status](https://travis-ci.org/sanger-pathogens/Fastaq.svg?branch=master)](https://travis-ci.org/sanger-pathogens/Fastaq)
4 [![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-brightgreen.svg)](https://github.com/sanger-pathogens/Fastaq/blob/master/LICENSE)
5
6 ## Contents
7 * [Introduction](#introduction)
8 * [Installation](#installation)
9 * [Using pip3](#using-pip3)
10 * [From source](#from-source)
11 * [Running the tests](#running-the-tests)
12 * [Usage](#usage)
13 * [Examples](#examples)
14 * [Available commands](#available-commands)
15 * [For developers](#for-developers)
16 * [License](#license)
17 * [Feedback/Issues](#feedbackissues)
18
19 ## Introduction
320 Python3 script to manipulate FASTA and FASTQ (and other format) files, plus API for developers
421
5 Installation
6 ------------
22 ## Installation
23 There are a number of ways to install Fastaq and details are provided below. If you encounter an issue when installing Fastaq please contact your local system administrator. If you encounter a bug please log it [here](https://github.com/sanger-pathogens/Fastaq/issues) or email us at path-help@sanger.ac.uk.
724
8 Install with pip3:
25 ### Using pip3
926
10 pip3 install pyfastaq
27 `pip3 install pyfastaq`
1128
29 ### From source
1230
13 Alternatively, you can download the latest release from this github repository,
14 or clone the repository. Then run the tests:
31 Download the latest release from this github repository or clone the repository. Then run the tests:
1532
16 python3 setup.py test
33 `python3 setup.py test`
1734
1835 If the tests all pass, install:
1936
20 python3 setup.py install
37 `python3 setup.py install`
2138
39 ### Running the tests
2240
23 Usage
24 -----
41 The test can be run from the top level directory:
42
43 `python3 setup.py test`
44
45 ### Runtime dependencies
46
47 These must be available in your path at run time:
48 * samtools 0.1.19
49 * gzip
50 * gunzip
51
52 ## Usage
2553
2654 The installation will put a single script called `fastaq` in your path.
2755 The usage is:
2856
29 fastaq <command> [options]
30
57 `fastaq <command> [options]`
3158
3259 Key points:
3360 * To list the available commands and brief descriptions, just run `fastaq`
3966 * Input and output files can be gzipped. An input file is assumed to be gzipped if its name ends with .gz. To gzip an output file, just name it with .gz at the end.
4067 * You can use a minus sign for a filename to use stdin or stdout, so commands can be piped together. See the example below.
4168
42
43 Examples
44 --------
69 ### Examples
4570
4671 Reverse complement all sequences in a file:
4772
48 fastaq reverse_complement in.fastq out.fastq
73 `fastaq reverse_complement in.fastq out.fastq`
4974
5075 Reverse complement all sequences in a gzipped file, then translate each sequence:
5176
52 fastaq reverse_complement in.fastq.gz - | fastaq translate - out.fasta
77 `fastaq reverse_complement in.fastq.gz - | fastaq translate - out.fasta`
5378
5479
55 Available commands
56 ------------------
80 ### Available commands
5781
5882 | Command | Description |
5983 |-----------------------|----------------------------------------------------------------------|
97121 | version | Print version number and exit |
98122
99123
100 For developers
101 --------------
124 ### For developers
102125
103126 Here is a template for counting the sequences in a FASTA or FASTQ file:
104
105 from pyfastaq import sequences
106 seq_reader = sequences.file_reader(infile)
107 count = 0
108 for seq in seq_reader:
109 count += 1
110 print(count)
111
127 ```
128 from pyfastaq import sequences
129 seq_reader = sequences.file_reader(infile)
130 count = 0
131 for seq in seq_reader:
132 count += 1
133 print(count)
134 ```
112135 Hopefully you get the idea and there are plenty of examples in tasks.py. Detection of the input file type and whether gzipped or not is automatic. See help(sequences) for the various methods already defined in the classes Fasta and Fastq.
113136
114 ---------------------------------
137 ## License
138 Fastaq is free software, licensed under [GPLv3](https://github.com/sanger-pathogens/Fastaq/blob/master/LICENSE).
115139
116 Build status: [![Build Status](https://travis-ci.org/sanger-pathogens/Fastaq.svg?branch=master)](https://travis-ci.org/sanger-pathogens/Fastaq)
117
118
140 ## Feedback/Issues
141 Please report any issues to the [issues page](https://github.com/sanger-pathogens/Fastaq/issues) or email path-help@sanger.ac.uk.
0 fastaq (3.17.0+git20211102.1.3a993b9-1) UNRELEASED; urgency=low
1
2 * New upstream snapshot.
3
4 -- Debian Janitor <janitor@jelmer.uk> Sat, 13 Nov 2021 00:14:36 -0000
5
06 fastaq (3.17.0-4) unstable; urgency=medium
17
28 [ Steffen Moeller ]
99 parser.add_argument('--regex', help='If given, only reads with a name matching the regular expression will be kept')
1010 parser.add_argument('--ids_file', help='If given, only reads whose ID is in th given file will be used. One ID per line of file.', metavar='FILENAME')
1111 parser.add_argument('-v', '--invert', action='store_true', help='Only keep sequences that do not match the filters')
12 parser.add_argument('--check_comments', action='store_true', help='Search the header comments also for the given regex. Can only be specified with --regex')
1213
1314 mate_group = parser.add_argument_group('Mate file for read pairs options')
1415 mate_group.add_argument('--mate_in', help='Name of mates input file. If used, must also provide --mate_out', metavar='FILENAME')
2829 mate_in=options.mate_in,
2930 mate_out=options.mate_out,
3031 both_mates_pass=options.both_mates_pass,
32 check_comments=options.check_comments,
3133 )
44 from pyfastaq import sequences, utils, caf
55
66 class Error (Exception): pass
7
8
9 class IncompatibleParametersError(Exception):
10 pass
11
712
813 def acgtn_only(infile, outfile):
914 '''Replace every non-acgtn (case insensitve) character with an N'''
283288 mate_in=None,
284289 mate_out=None,
285290 both_mates_pass=True,
291 check_comments=False
286292 ):
293 if check_comments and not regex:
294 raise IncompatibleParametersError(
295 "--check_comments can only be passed with --regex"
296 )
287297
288298 ids_from_file = set()
289299 if ids_file is not None:
308318 def passes(seq, name_regex):
309319 # remove trailing comments from FASTQ readname lines
310320 matches = name_regex.match(seq.id)
311 if matches is not None:
321 if matches is not None and not check_comments:
312322 clean_seq_id = matches.group(1)
313323 else:
314324 clean_seq_id = seq.id
+0
-48
pyfastaq/tests/data/caf_test.caf less more
0
1 DNA : read1.p1k
2 NACG
3 TAN
4
5 BaseQuality : read1.p1k
6 4 24 42 43 40 30 8
7
8 Sequence : read1.p1k
9 Is_read
10 SCF_File read1.p1kSCF
11 Template read1
12 Insert_size 2000 4000
13 Ligation_no 12345
14 Primer Universal_primer
15 Strand Forward
16 Dye Dye_terminator
17 Clone clone1
18 Seq_vec SVEC 1 15 puc19
19 Sequencing_vector "puc19"
20 Clipping QUAL 2 6
21 ProcessStatus PASS
22 Asped 2006-7-5
23 Unpadded
24 Align_to_SCF 1 1272 1 1272
25
26 DNA : read2.p1k
27 CG
28 ACGTT
29
30 BaseQuality : read2.p1k
31 9 9 40 41 42 42 4
32
33 Sequence : read2.p1k
34 Is_read
35 SCF_File read2.p1kSCF
36 Template read2
37 Insert_size 2000 4000
38 Ligation_no 23456
39 Primer Universal_primer
40 Strand Forward
41 Dye Dye_terminator
42 Clone clone2
43 Seq_vec SVEC 1 32 puc19
44 Sequencing_vector "puc19"
45 ProcessStatus PASS
46 Unpadded
47 Align_to_SCF 1 1347 1 1347
+0
-8
pyfastaq/tests/data/caf_test.to_fastq.no_trim.min_length_0.fq less more
0 @read1.p1k
1 NACGTAN
2 +
3 %9KLI?)
4 @read2.p1k
5 CGACGTT
6 +
7 **IJKK%
+0
-4
pyfastaq/tests/data/caf_test.to_fastq.trim.min_length_6.fq less more
0 @read2.p1k
1 CGACGTT
2 +
3 **IJKK%
+0
-20
pyfastaq/tests/data/readnames_with_comments.fastq less more
0 @A1234::15950:1663 stuff_to_remove
1 TCGTAAGCCTGCTCGAGC
2 +
3 >>3>>44@CFFFGG??EE
4 @A1234::16080:1672 stuff_to_remove
5 CCATCGTCTTCGCCCTGC
6 +
7 111AA1AAAAF1EAEGAG
8 @A1234::12967:1677 stuff_to_remove
9 CTCCAGCATCGTGCAAAT
10 +
11 3>>A?@CBDFAAACCBAF
12 @A1234::16114:1681 stuff_to_remove
13 TTGATATAGAGATACTTC
14 +
15 3>A3A5D55DBFFDFGGG
16 @A1234::16669:1683 stuff_to_remove
17 CTGCGCGACTATACGCAG
18 +
19 1>1>>>A1>D?FF10E0A
+0
-4
pyfastaq/tests/data/readnames_with_comments.fastq.filtered less more
0 @A1234::12967:1677 stuff_to_remove
1 CTCCAGCATCGTGCAAAT
2 +
3 3>>A?@CBDFAAACCBAF
+0
-1
pyfastaq/tests/data/readnames_with_comments.fastq.ids less more
0 A1234::12967:1677
+0
-203
pyfastaq/tests/data/sequences_test.embl less more
0 ID seq1; SV 1; linear; mRNA; STD; PLN; 1859 BP.
1 XX
2 AC X56734; S46826;
3 XX
4 DT 12-SEP-1991 (Rel. 29, Created)
5 DT 25-NOV-2005 (Rel. 85, Last updated, Version 11)
6 XX
7 DE Trifolium repens mRNA for non-cyanogenic beta-glucosidase
8 XX
9 KW beta-glucosidase.
10 XX
11 OS Trifolium repens (white clover)
12 OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
13 OC Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids;
14 OC fabids; Fabales; Fabaceae; Papilionoideae; Trifolieae; Trifolium.
15 XX
16 RN [5]
17 RP 1-1859
18 RX DOI; 10.1007/BF00039495.
19 RX PUBMED; 1907511.
20 RA Oxtoby E., Dunn M.A., Pancoro A., Hughes M.A.;
21 RT "Nucleotide and derived amino acid sequence of the cyanogenic
22 RT beta-glucosidase (linamarase) from white clover (Trifolium repens L.)";
23 RL Plant Mol. Biol. 17(2):209-219(1991).
24 XX
25 RN [6]
26 RP 1-1859
27 RA Hughes M.A.;
28 RT ;
29 RL Submitted (19-NOV-1990) to the INSDC.
30 RL Hughes M.A., University of Newcastle Upon Tyne, Medical School, Newcastle
31 RL Upon Tyne, NE2 4HH, UK
32 XX
33 DR EuropePMC; PMC99098; 11752244.
34 XX
35 FH Key Location/Qualifiers
36 FH
37 FT source 1..1859
38 FT /organism="Trifolium repens"
39 FT /mol_type="mRNA"
40 FT /clone_lib="lambda gt10"
41 FT /clone="TRE361"
42 FT /tissue_type="leaves"
43 FT /db_xref="taxon:3899"
44 FT mRNA 1..1859
45 FT /experiment="experimental evidence, no additional details
46 FT recorded"
47 FT CDS 14..1495
48 FT /product="beta-glucosidase"
49 FT /EC_number="3.2.1.21"
50 FT /note="non-cyanogenic"
51 FT /db_xref="GOA:P26204"
52 FT /db_xref="InterPro:IPR001360"
53 FT /db_xref="InterPro:IPR013781"
54 FT /db_xref="InterPro:IPR017853"
55 FT /db_xref="InterPro:IPR018120"
56 FT /db_xref="UniProtKB/Swiss-Prot:P26204"
57 FT /protein_id="CAA40058.1"
58 FT /translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSRSSFPRGFI
59 FT FGAGSSAYQFEGAVNEGGRGPSIWDTFTHKYPEKIRDGSNADITVDQYHRYKEDVGIMK
60 FT DQNMDSYRFSISWPRILPKGKLSGGINHEGIKYYNNLINELLANGIQPFVTLFHWDLPQ
61 FT VLEDEYGGFLNSGVINDFRDYTDLCFKEFGDRVRYWSTLNEPWVFSNSGYALGTNAPGR
62 FT CSASNVAKPGDSGTGPYIVTHNQILAHAEAVHVYKTKYQAYQKGKIGITLVSNWLMPLD
63 FT DNSIPDIKAAERSLDFQFGLFMEQLTTGDYSKSMRRIVKNRLPKFSKFESSLVNGSFDF
64 FT IGINYYSSSYISNAPSHGNAKPSYSTNPMTNISFEKHGIPLGPRAASIWIYVYPYMFIQ
65 FT EDFEIFCYILKINITILQFSITENGMNEFNDATLPVEEALLNTYRIDYYYRHLYYIRSA
66 FT IRAGSNVKGFYAWSFLDCNEWFAGFTVRFGLNFVD"
67 XX
68 SQ Sequence 1859 BP; 609 A; 314 C; 355 G; 581 T; 0 other;
69 aaacaaacca aatatggatt ttattgtagc catatttgct ctgtttgtta ttagctcatt 60
70 cacaattact tccacaaatg cagttgaagc ttctactctt cttgacatag gtaacctgag 120
71 tcggagcagt tttcctcgtg gcttcatctt tggtgctgga tcttcagcat accaatttga 180
72 aggtgcagta aacgaaggcg gtagaggacc aagtatttgg gataccttca cccataaata 240
73 tccagaaaaa ataagggatg gaagcaatgc agacatcacg gttgaccaat atcaccgcta 300
74 caaggaagat gttgggatta tgaaggatca aaatatggat tcgtatagat tctcaatctc 360
75 ttggccaaga atactcccaa agggaaagtt gagcggaggc ataaatcacg aaggaatcaa 420
76 atattacaac aaccttatca acgaactatt ggctaacggt atacaaccat ttgtaactct 480
77 ttttcattgg gatcttcccc aagtcttaga agatgagtat ggtggtttct taaactccgg 540
78 tgtaataaat gattttcgag actatacgga tctttgcttc aaggaatttg gagatagagt 600
79 gaggtattgg agtactctaa atgagccatg ggtgtttagc aattctggat atgcactagg 660
80 aacaaatgca ccaggtcgat gttcggcctc caacgtggcc aagcctggtg attctggaac 720
81 aggaccttat atagttacac acaatcaaat tcttgctcat gcagaagctg tacatgtgta 780
82 taagactaaa taccaggcat atcaaaaggg aaagataggc ataacgttgg tatctaactg 840
83 gttaatgcca cttgatgata atagcatacc agatataaag gctgccgaga gatcacttga 900
84 cttccaattt ggattgttta tggaacaatt aacaacagga gattattcta agagcatgcg 960
85 gcgtatagtt aaaaaccgat tacctaagtt ctcaaaattc gaatcaagcc tagtgaatgg 1020
86 ttcatttgat tttattggta taaactatta ctcttctagt tatattagca atgccccttc 1080
87 acatggcaat gccaaaccca gttactcaac aaatcctatg accaatattt catttgaaaa 1140
88 acatgggata cccttaggtc caagggctgc ttcaatttgg atatatgttt atccatatat 1200
89 gtttatccaa gaggacttcg agatcttttg ttacatatta aaaataaata taacaatcct 1260
90 gcaattttca atcactgaaa atggtatgaa tgaattcaac gatgcaacac ttccagtaga 1320
91 agaagctctt ttgaatactt acagaattga ttactattac cgtcacttat actacattcg 1380
92 ttctgcaatc agggctggct caaatgtgaa gggtttttac gcatggtcat ttttggactg 1440
93 taatgaatgg tttgcaggct ttactgttcg ttttggatta aactttgtag attagaaaga 1500
94 tggattaaaa aggtacccta agctttctgc ccaatggtac aagaactttc tcaaaagaaa 1560
95 ctagctagta ttattaaaag aactttgtag tagattacag tacatcgttt gaagttgagt 1620
96 tggtgcacct aattaaataa aagaggttac tcttaacata tttttaggcc attcgttgtg 1680
97 aagttgttag gctgttattt ctattatact atgttgtagt aataagtgca ttgttgtacc 1740
98 agaagctatg atcataacta taggttgatc cttcatgtat cagtttgatg ttgagaatac 1800
99 tttgaattaa aagtcttttt ttattttttt aaaaaaaaaa aaaaaaaaaa aaaaaaaaa 1859
100 //
101 ID seq2; SV 1; linear; mRNA; STD; PLN; 1859 BP.
102 XX
103 AC X56734; S46826;
104 XX
105 DT 12-SEP-1991 (Rel. 29, Created)
106 DT 25-NOV-2005 (Rel. 85, Last updated, Version 11)
107 XX
108 DE Trifolium repens mRNA for non-cyanogenic beta-glucosidase
109 XX
110 KW beta-glucosidase.
111 XX
112 OS Trifolium repens (white clover)
113 OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
114 OC Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids;
115 OC fabids; Fabales; Fabaceae; Papilionoideae; Trifolieae; Trifolium.
116 XX
117 RN [5]
118 RP 1-1859
119 RX DOI; 10.1007/BF00039495.
120 RX PUBMED; 1907511.
121 RA Oxtoby E., Dunn M.A., Pancoro A., Hughes M.A.;
122 RT "Nucleotide and derived amino acid sequence of the cyanogenic
123 RT beta-glucosidase (linamarase) from white clover (Trifolium repens L.)";
124 RL Plant Mol. Biol. 17(2):209-219(1991).
125 XX
126 RN [6]
127 RP 1-1859
128 RA Hughes M.A.;
129 RT ;
130 RL Submitted (19-NOV-1990) to the INSDC.
131 RL Hughes M.A., University of Newcastle Upon Tyne, Medical School, Newcastle
132 RL Upon Tyne, NE2 4HH, UK
133 XX
134 DR EuropePMC; PMC99098; 11752244.
135 XX
136 FH Key Location/Qualifiers
137 FH
138 FT source 1..1859
139 FT /organism="Trifolium repens"
140 FT /mol_type="mRNA"
141 FT /clone_lib="lambda gt10"
142 FT /clone="TRE361"
143 FT /tissue_type="leaves"
144 FT /db_xref="taxon:3899"
145 FT mRNA 1..1859
146 FT /experiment="experimental evidence, no additional details
147 FT recorded"
148 FT CDS 14..1495
149 FT /product="beta-glucosidase"
150 FT /EC_number="3.2.1.21"
151 FT /note="non-cyanogenic"
152 FT /db_xref="GOA:P26204"
153 FT /db_xref="InterPro:IPR001360"
154 FT /db_xref="InterPro:IPR013781"
155 FT /db_xref="InterPro:IPR017853"
156 FT /db_xref="InterPro:IPR018120"
157 FT /db_xref="UniProtKB/Swiss-Prot:P26204"
158 FT /protein_id="CAA40058.1"
159 FT /translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSRSSFPRGFI
160 FT FGAGSSAYQFEGAVNEGGRGPSIWDTFTHKYPEKIRDGSNADITVDQYHRYKEDVGIMK
161 FT DQNMDSYRFSISWPRILPKGKLSGGINHEGIKYYNNLINELLANGIQPFVTLFHWDLPQ
162 FT VLEDEYGGFLNSGVINDFRDYTDLCFKEFGDRVRYWSTLNEPWVFSNSGYALGTNAPGR
163 FT CSASNVAKPGDSGTGPYIVTHNQILAHAEAVHVYKTKYQAYQKGKIGITLVSNWLMPLD
164 FT DNSIPDIKAAERSLDFQFGLFMEQLTTGDYSKSMRRIVKNRLPKFSKFESSLVNGSFDF
165 FT IGINYYSSSYISNAPSHGNAKPSYSTNPMTNISFEKHGIPLGPRAASIWIYVYPYMFIQ
166 FT EDFEIFCYILKINITILQFSITENGMNEFNDATLPVEEALLNTYRIDYYYRHLYYIRSA
167 FT IRAGSNVKGFYAWSFLDCNEWFAGFTVRFGLNFVD"
168 XX
169 SQ Sequence 1859 BP; 609 A; 314 C; 355 G; 581 T; 0 other;
170 aaacaaacca aatatggatt ttattgtagc catatttgct ctgtttgtta ttagctcatt 60
171 cacaattact tccacaaatg cagttgaagc ttctactctt cttgacatag gtaacctgag 120
172 tcggagcagt tttcctcgtg gcttcatctt tggtgctgga tcttcagcat accaatttga 180
173 aggtgcagta aacgaaggcg gtagaggacc aagtatttgg gataccttca cccataaata 240
174 tccagaaaaa ataagggatg gaagcaatgc agacatcacg gttgaccaat atcaccgcta 300
175 caaggaagat gttgggatta tgaaggatca aaatatggat tcgtatagat tctcaatctc 360
176 ttggccaaga atactcccaa agggaaagtt gagcggaggc ataaatcacg aaggaatcaa 420
177 atattacaac aaccttatca acgaactatt ggctaacggt atacaaccat ttgtaactct 480
178 ttttcattgg gatcttcccc aagtcttaga agatgagtat ggtggtttct taaactccgg 540
179 tgtaataaat gattttcgag actatacgga tctttgcttc aaggaatttg gagatagagt 600
180 gaggtattgg agtactctaa atgagccatg ggtgtttagc aattctggat atgcactagg 660
181 aacaaatgca ccaggtcgat gttcggcctc caacgtggcc aagcctggtg attctggaac 720
182 aggaccttat atagttacac acaatcaaat tcttgctcat gcagaagctg tacatgtgta 780
183 taagactaaa taccaggcat atcaaaaggg aaagataggc ataacgttgg tatctaactg 840
184 gttaatgcca cttgatgata atagcatacc agatataaag gctgccgaga gatcacttga 900
185 cttccaattt ggattgttta tggaacaatt aacaacagga gattattcta agagcatgcg 960
186 gcgtatagtt aaaaaccgat tacctaagtt ctcaaaattc gaatcaagcc tagtgaatgg 1020
187 ttcatttgat tttattggta taaactatta ctcttctagt tatattagca atgccccttc 1080
188 acatggcaat gccaaaccca gttactcaac aaatcctatg accaatattt catttgaaaa 1140
189 acatgggata cccttaggtc caagggctgc ttcaatttgg atatatgttt atccatatat 1200
190 gtttatccaa gaggacttcg agatcttttg ttacatatta aaaataaata taacaatcct 1260
191 gcaattttca atcactgaaa atggtatgaa tgaattcaac gatgcaacac ttccagtaga 1320
192 agaagctctt ttgaatactt acagaattga ttactattac cgtcacttat actacattcg 1380
193 ttctgcaatc agggctggct caaatgtgaa gggtttttac gcatggtcat ttttggactg 1440
194 taatgaatgg tttgcaggct ttactgttcg ttttggatta aactttgtag attagaaaga 1500
195 tggattaaaa aggtacccta agctttctgc ccaatggtac aagaactttc tcaaaagaaa 1560
196 ctagctagta ttattaaaag aactttgtag tagattacag tacatcgttt gaagttgagt 1620
197 tggtgcacct aattaaataa aagaggttac tcttaacata tttttaggcc attcgttgtg 1680
198 aagttgttag gctgttattt ctattatact atgttgtagt aataagtgca ttgttgtacc 1740
199 agaagctatg atcataacta taggttgatc cttcatgtat cagtttgatg ttgagaatac 1800
200 tttgaattaa aagtcttttt ttattttttt aaaaaaaaaa aaaaaaaaaa ccccccccc 1859
201 //
202
+0
-202
pyfastaq/tests/data/sequences_test.embl.bad less more
0 ID seq1; SV 1; linear; mRNA; STD; PLN; 1859 BP.
1 XX
2 AC X56734; S46826;
3 XX
4 DT 12-SEP-1991 (Rel. 29, Created)
5 DT 25-NOV-2005 (Rel. 85, Last updated, Version 11)
6 XX
7 DE Trifolium repens mRNA for non-cyanogenic beta-glucosidase
8 XX
9 KW beta-glucosidase.
10 XX
11 OS Trifolium repens (white clover)
12 OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
13 OC Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids;
14 OC fabids; Fabales; Fabaceae; Papilionoideae; Trifolieae; Trifolium.
15 XX
16 RN [5]
17 RP 1-1859
18 RX DOI; 10.1007/BF00039495.
19 RX PUBMED; 1907511.
20 RA Oxtoby E., Dunn M.A., Pancoro A., Hughes M.A.;
21 RT "Nucleotide and derived amino acid sequence of the cyanogenic
22 RT beta-glucosidase (linamarase) from white clover (Trifolium repens L.)";
23 RL Plant Mol. Biol. 17(2):209-219(1991).
24 XX
25 RN [6]
26 RP 1-1859
27 RA Hughes M.A.;
28 RT ;
29 RL Submitted (19-NOV-1990) to the INSDC.
30 RL Hughes M.A., University of Newcastle Upon Tyne, Medical School, Newcastle
31 RL Upon Tyne, NE2 4HH, UK
32 XX
33 DR EuropePMC; PMC99098; 11752244.
34 XX
35 FH Key Location/Qualifiers
36 FH
37 FT source 1..1859
38 FT /organism="Trifolium repens"
39 FT /mol_type="mRNA"
40 FT /clone_lib="lambda gt10"
41 FT /clone="TRE361"
42 FT /tissue_type="leaves"
43 FT /db_xref="taxon:3899"
44 FT mRNA 1..1859
45 FT /experiment="experimental evidence, no additional details
46 FT recorded"
47 FT CDS 14..1495
48 FT /product="beta-glucosidase"
49 FT /EC_number="3.2.1.21"
50 FT /note="non-cyanogenic"
51 FT /db_xref="GOA:P26204"
52 FT /db_xref="InterPro:IPR001360"
53 FT /db_xref="InterPro:IPR013781"
54 FT /db_xref="InterPro:IPR017853"
55 FT /db_xref="InterPro:IPR018120"
56 FT /db_xref="UniProtKB/Swiss-Prot:P26204"
57 FT /protein_id="CAA40058.1"
58 FT /translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSRSSFPRGFI
59 FT FGAGSSAYQFEGAVNEGGRGPSIWDTFTHKYPEKIRDGSNADITVDQYHRYKEDVGIMK
60 FT DQNMDSYRFSISWPRILPKGKLSGGINHEGIKYYNNLINELLANGIQPFVTLFHWDLPQ
61 FT VLEDEYGGFLNSGVINDFRDYTDLCFKEFGDRVRYWSTLNEPWVFSNSGYALGTNAPGR
62 FT CSASNVAKPGDSGTGPYIVTHNQILAHAEAVHVYKTKYQAYQKGKIGITLVSNWLMPLD
63 FT DNSIPDIKAAERSLDFQFGLFMEQLTTGDYSKSMRRIVKNRLPKFSKFESSLVNGSFDF
64 FT IGINYYSSSYISNAPSHGNAKPSYSTNPMTNISFEKHGIPLGPRAASIWIYVYPYMFIQ
65 FT EDFEIFCYILKINITILQFSITENGMNEFNDATLPVEEALLNTYRIDYYYRHLYYIRSA
66 FT IRAGSNVKGFYAWSFLDCNEWFAGFTVRFGLNFVD"
67 XX
68 SQ Sequence 1859 BP; 609 A; 314 C; 355 G; 581 T; 0 other;
69 aaacaaacca aatatggatt ttattgtagc catatttgct ctgtttgtta ttagctcatt 60
70 cacaattact tccacaaatg cagttgaagc ttctactctt cttgacatag gtaacctgag 120
71 tcggagcagt tttcctcgtg gcttcatctt tggtgctgga tcttcagcat accaatttga 180
72 aggtgcagta aacgaaggcg gtagaggacc aagtatttgg gataccttca cccataaata 240
73 tccagaaaaa ataagggatg gaagcaatgc agacatcacg gttgaccaat atcaccgcta 300
74 caaggaagat gttgggatta tgaaggatca aaatatggat tcgtatagat tctcaatctc 360
75 ttggccaaga atactcccaa agggaaagtt gagcggaggc ataaatcacg aaggaatcaa 420
76 atattacaac aaccttatca acgaactatt ggctaacggt atacaaccat ttgtaactct 480
77 ttttcattgg gatcttcccc aagtcttaga agatgagtat ggtggtttct taaactccgg 540
78 tgtaataaat gattttcgag actatacgga tctttgcttc aaggaatttg gagatagagt 600
79 gaggtattgg agtactctaa atgagccatg ggtgtttagc aattctggat atgcactagg 660
80 aacaaatgca ccaggtcgat gttcggcctc caacgtggcc aagcctggtg attctggaac 720
81 aggaccttat atagttacac acaatcaaat tcttgctcat gcagaagctg tacatgtgta 780
82 taagactaaa taccaggcat atcaaaaggg aaagataggc ataacgttgg tatctaactg 840
83 gttaatgcca cttgatgata atagcatacc agatataaag gctgccgaga gatcacttga 900
84 cttccaattt ggattgttta tggaacaatt aacaacagga gattattcta agagcatgcg 960
85 gcgtatagtt aaaaaccgat tacctaagtt ctcaaaattc gaatcaagcc tagtgaatgg 1020
86 ttcatttgat tttattggta taaactatta ctcttctagt tatattagca atgccccttc 1080
87 acatggcaat gccaaaccca gttactcaac aaatcctatg accaatattt catttgaaaa 1140
88 acatgggata cccttaggtc caagggctgc ttcaatttgg atatatgttt atccatatat 1200
89 gtttatccaa gaggacttcg agatcttttg ttacatatta aaaataaata taacaatcct 1260
90 gcaattttca atcactgaaa atggtatgaa tgaattcaac gatgcaacac ttccagtaga 1320
91 agaagctctt ttgaatactt acagaattga ttactattac cgtcacttat actacattcg 1380
92 ttctgcaatc agggctggct caaatgtgaa gggtttttac gcatggtcat ttttggactg 1440
93 taatgaatgg tttgcaggct ttactgttcg ttttggatta aactttgtag attagaaaga 1500
94 tggattaaaa aggtacccta agctttctgc ccaatggtac aagaactttc tcaaaagaaa 1560
95 ctagctagta ttattaaaag aactttgtag tagattacag tacatcgttt gaagttgagt 1620
96 tggtgcacct aattaaataa aagaggttac tcttaacata tttttaggcc attcgttgtg 1680
97 aagttgttag gctgttattt ctattatact atgttgtagt aataagtgca ttgttgtacc 1740
98 agaagctatg atcataacta taggttgatc cttcatgtat cagtttgatg ttgagaatac 1800
99 tttgaattaa aagtcttttt ttattttttt aaaaaaaaaa aaaaaaaaaa aaaaaaaaa 1859
100 //
101 ID seq2; SV 1; linear; mRNA; STD; PLN; 1859 BP.
102 XX
103 AC X56734; S46826;
104 XX
105 DT 12-SEP-1991 (Rel. 29, Created)
106 DT 25-NOV-2005 (Rel. 85, Last updated, Version 11)
107 XX
108 DE Trifolium repens mRNA for non-cyanogenic beta-glucosidase
109 XX
110 KW beta-glucosidase.
111 XX
112 OS Trifolium repens (white clover)
113 OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
114 OC Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids;
115 OC fabids; Fabales; Fabaceae; Papilionoideae; Trifolieae; Trifolium.
116 XX
117 RN [5]
118 RP 1-1859
119 RX DOI; 10.1007/BF00039495.
120 RX PUBMED; 1907511.
121 RA Oxtoby E., Dunn M.A., Pancoro A., Hughes M.A.;
122 RT "Nucleotide and derived amino acid sequence of the cyanogenic
123 RT beta-glucosidase (linamarase) from white clover (Trifolium repens L.)";
124 RL Plant Mol. Biol. 17(2):209-219(1991).
125 XX
126 RN [6]
127 RP 1-1859
128 RA Hughes M.A.;
129 RT ;
130 RL Submitted (19-NOV-1990) to the INSDC.
131 RL Hughes M.A., University of Newcastle Upon Tyne, Medical School, Newcastle
132 RL Upon Tyne, NE2 4HH, UK
133 XX
134 DR EuropePMC; PMC99098; 11752244.
135 XX
136 FH Key Location/Qualifiers
137 FH
138 FT source 1..1859
139 FT /organism="Trifolium repens"
140 FT /mol_type="mRNA"
141 FT /clone_lib="lambda gt10"
142 FT /clone="TRE361"
143 FT /tissue_type="leaves"
144 FT /db_xref="taxon:3899"
145 FT mRNA 1..1859
146 FT /experiment="experimental evidence, no additional details
147 FT recorded"
148 FT CDS 14..1495
149 FT /product="beta-glucosidase"
150 FT /EC_number="3.2.1.21"
151 FT /note="non-cyanogenic"
152 FT /db_xref="GOA:P26204"
153 FT /db_xref="InterPro:IPR001360"
154 FT /db_xref="InterPro:IPR013781"
155 FT /db_xref="InterPro:IPR017853"
156 FT /db_xref="InterPro:IPR018120"
157 FT /db_xref="UniProtKB/Swiss-Prot:P26204"
158 FT /protein_id="CAA40058.1"
159 FT /translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSRSSFPRGFI
160 FT FGAGSSAYQFEGAVNEGGRGPSIWDTFTHKYPEKIRDGSNADITVDQYHRYKEDVGIMK
161 FT DQNMDSYRFSISWPRILPKGKLSGGINHEGIKYYNNLINELLANGIQPFVTLFHWDLPQ
162 FT VLEDEYGGFLNSGVINDFRDYTDLCFKEFGDRVRYWSTLNEPWVFSNSGYALGTNAPGR
163 FT CSASNVAKPGDSGTGPYIVTHNQILAHAEAVHVYKTKYQAYQKGKIGITLVSNWLMPLD
164 FT DNSIPDIKAAERSLDFQFGLFMEQLTTGDYSKSMRRIVKNRLPKFSKFESSLVNGSFDF
165 FT IGINYYSSSYISNAPSHGNAKPSYSTNPMTNISFEKHGIPLGPRAASIWIYVYPYMFIQ
166 FT EDFEIFCYILKINITILQFSITENGMNEFNDATLPVEEALLNTYRIDYYYRHLYYIRSA
167 FT IRAGSNVKGFYAWSFLDCNEWFAGFTVRFGLNFVD"
168 XX
169 aaacaaacca aatatggatt ttattgtagc catatttgct ctgtttgtta ttagctcatt 60
170 cacaattact tccacaaatg cagttgaagc ttctactctt cttgacatag gtaacctgag 120
171 tcggagcagt tttcctcgtg gcttcatctt tggtgctgga tcttcagcat accaatttga 180
172 aggtgcagta aacgaaggcg gtagaggacc aagtatttgg gataccttca cccataaata 240
173 tccagaaaaa ataagggatg gaagcaatgc agacatcacg gttgaccaat atcaccgcta 300
174 caaggaagat gttgggatta tgaaggatca aaatatggat tcgtatagat tctcaatctc 360
175 ttggccaaga atactcccaa agggaaagtt gagcggaggc ataaatcacg aaggaatcaa 420
176 atattacaac aaccttatca acgaactatt ggctaacggt atacaaccat ttgtaactct 480
177 ttttcattgg gatcttcccc aagtcttaga agatgagtat ggtggtttct taaactccgg 540
178 tgtaataaat gattttcgag actatacgga tctttgcttc aaggaatttg gagatagagt 600
179 gaggtattgg agtactctaa atgagccatg ggtgtttagc aattctggat atgcactagg 660
180 aacaaatgca ccaggtcgat gttcggcctc caacgtggcc aagcctggtg attctggaac 720
181 aggaccttat atagttacac acaatcaaat tcttgctcat gcagaagctg tacatgtgta 780
182 taagactaaa taccaggcat atcaaaaggg aaagataggc ataacgttgg tatctaactg 840
183 gttaatgcca cttgatgata atagcatacc agatataaag gctgccgaga gatcacttga 900
184 cttccaattt ggattgttta tggaacaatt aacaacagga gattattcta agagcatgcg 960
185 gcgtatagtt aaaaaccgat tacctaagtt ctcaaaattc gaatcaagcc tagtgaatgg 1020
186 ttcatttgat tttattggta taaactatta ctcttctagt tatattagca atgccccttc 1080
187 acatggcaat gccaaaccca gttactcaac aaatcctatg accaatattt catttgaaaa 1140
188 acatgggata cccttaggtc caagggctgc ttcaatttgg atatatgttt atccatatat 1200
189 gtttatccaa gaggacttcg agatcttttg ttacatatta aaaataaata taacaatcct 1260
190 gcaattttca atcactgaaa atggtatgaa tgaattcaac gatgcaacac ttccagtaga 1320
191 agaagctctt ttgaatactt acagaattga ttactattac cgtcacttat actacattcg 1380
192 ttctgcaatc agggctggct caaatgtgaa gggtttttac gcatggtcat ttttggactg 1440
193 taatgaatgg tttgcaggct ttactgttcg ttttggatta aactttgtag attagaaaga 1500
194 tggattaaaa aggtacccta agctttctgc ccaatggtac aagaactttc tcaaaagaaa 1560
195 ctagctagta ttattaaaag aactttgtag tagattacag tacatcgttt gaagttgagt 1620
196 tggtgcacct aattaaataa aagaggttac tcttaacata tttttaggcc attcgttgtg 1680
197 aagttgttag gctgttattt ctattatact atgttgtagt aataagtgca ttgttgtacc 1740
198 agaagctatg atcataacta taggttgatc cttcatgtat cagtttgatg ttgagaatac 1800
199 tttgaattaa aagtcttttt ttattttttt aaaaaaaaaa aaaaaaaaaa ccccccccc 1859
200 //
201
+0
-202
pyfastaq/tests/data/sequences_test.embl.bad2 less more
0 ID seq1; SV 1; linear; mRNA; STD; PLN; 1859 BP.
1 XX
2 AC X56734; S46826;
3 XX
4 DT 12-SEP-1991 (Rel. 29, Created)
5 DT 25-NOV-2005 (Rel. 85, Last updated, Version 11)
6 XX
7 DE Trifolium repens mRNA for non-cyanogenic beta-glucosidase
8 XX
9 KW beta-glucosidase.
10 XX
11 OS Trifolium repens (white clover)
12 OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
13 OC Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids;
14 OC fabids; Fabales; Fabaceae; Papilionoideae; Trifolieae; Trifolium.
15 XX
16 RN [5]
17 RP 1-1859
18 RX DOI; 10.1007/BF00039495.
19 RX PUBMED; 1907511.
20 RA Oxtoby E., Dunn M.A., Pancoro A., Hughes M.A.;
21 RT "Nucleotide and derived amino acid sequence of the cyanogenic
22 RT beta-glucosidase (linamarase) from white clover (Trifolium repens L.)";
23 RL Plant Mol. Biol. 17(2):209-219(1991).
24 XX
25 RN [6]
26 RP 1-1859
27 RA Hughes M.A.;
28 RT ;
29 RL Submitted (19-NOV-1990) to the INSDC.
30 RL Hughes M.A., University of Newcastle Upon Tyne, Medical School, Newcastle
31 RL Upon Tyne, NE2 4HH, UK
32 XX
33 DR EuropePMC; PMC99098; 11752244.
34 XX
35 FH Key Location/Qualifiers
36 FH
37 FT source 1..1859
38 FT /organism="Trifolium repens"
39 FT /mol_type="mRNA"
40 FT /clone_lib="lambda gt10"
41 FT /clone="TRE361"
42 FT /tissue_type="leaves"
43 FT /db_xref="taxon:3899"
44 FT mRNA 1..1859
45 FT /experiment="experimental evidence, no additional details
46 FT recorded"
47 FT CDS 14..1495
48 FT /product="beta-glucosidase"
49 FT /EC_number="3.2.1.21"
50 FT /note="non-cyanogenic"
51 FT /db_xref="GOA:P26204"
52 FT /db_xref="InterPro:IPR001360"
53 FT /db_xref="InterPro:IPR013781"
54 FT /db_xref="InterPro:IPR017853"
55 FT /db_xref="InterPro:IPR018120"
56 FT /db_xref="UniProtKB/Swiss-Prot:P26204"
57 FT /protein_id="CAA40058.1"
58 FT /translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSRSSFPRGFI
59 FT FGAGSSAYQFEGAVNEGGRGPSIWDTFTHKYPEKIRDGSNADITVDQYHRYKEDVGIMK
60 FT DQNMDSYRFSISWPRILPKGKLSGGINHEGIKYYNNLINELLANGIQPFVTLFHWDLPQ
61 FT VLEDEYGGFLNSGVINDFRDYTDLCFKEFGDRVRYWSTLNEPWVFSNSGYALGTNAPGR
62 FT CSASNVAKPGDSGTGPYIVTHNQILAHAEAVHVYKTKYQAYQKGKIGITLVSNWLMPLD
63 FT DNSIPDIKAAERSLDFQFGLFMEQLTTGDYSKSMRRIVKNRLPKFSKFESSLVNGSFDF
64 FT IGINYYSSSYISNAPSHGNAKPSYSTNPMTNISFEKHGIPLGPRAASIWIYVYPYMFIQ
65 FT EDFEIFCYILKINITILQFSITENGMNEFNDATLPVEEALLNTYRIDYYYRHLYYIRSA
66 FT IRAGSNVKGFYAWSFLDCNEWFAGFTVRFGLNFVD"
67 XX
68 SQ Sequence 1859 BP; 609 A; 314 C; 355 G; 581 T; 0 other;
69 aaacaaacca aatatggatt ttattgtagc catatttgct ctgtttgtta ttagctcatt 60
70 cacaattact tccacaaatg cagttgaagc ttctactctt cttgacatag gtaacctgag 120
71 tcggagcagt tttcctcgtg gcttcatctt tggtgctgga tcttcagcat accaatttga 180
72 aggtgcagta aacgaaggcg gtagaggacc aagtatttgg gataccttca cccataaata 240
73 tccagaaaaa ataagggatg gaagcaatgc agacatcacg gttgaccaat atcaccgcta 300
74 caaggaagat gttgggatta tgaaggatca aaatatggat tcgtatagat tctcaatctc 360
75 ttggccaaga atactcccaa agggaaagtt gagcggaggc ataaatcacg aaggaatcaa 420
76 atattacaac aaccttatca acgaactatt ggctaacggt atacaaccat ttgtaactct 480
77 ttttcattgg gatcttcccc aagtcttaga agatgagtat ggtggtttct taaactccgg 540
78 tgtaataaat gattttcgag actatacgga tctttgcttc aaggaatttg gagatagagt 600
79 gaggtattgg agtactctaa atgagccatg ggtgtttagc aattctggat atgcactagg 660
80 aacaaatgca ccaggtcgat gttcggcctc caacgtggcc aagcctggtg attctggaac 720
81 aggaccttat atagttacac acaatcaaat tcttgctcat gcagaagctg tacatgtgta 780
82 taagactaaa taccaggcat atcaaaaggg aaagataggc ataacgttgg tatctaactg 840
83 gttaatgcca cttgatgata atagcatacc agatataaag gctgccgaga gatcacttga 900
84 cttccaattt ggattgttta tggaacaatt aacaacagga gattattcta agagcatgcg 960
85 gcgtatagtt aaaaaccgat tacctaagtt ctcaaaattc gaatcaagcc tagtgaatgg 1020
86 ttcatttgat tttattggta taaactatta ctcttctagt tatattagca atgccccttc 1080
87 acatggcaat gccaaaccca gttactcaac aaatcctatg accaatattt catttgaaaa 1140
88 acatgggata cccttaggtc caagggctgc ttcaatttgg atatatgttt atccatatat 1200
89 gtttatccaa gaggacttcg agatcttttg ttacatatta aaaataaata taacaatcct 1260
90 gcaattttca atcactgaaa atggtatgaa tgaattcaac gatgcaacac ttccagtaga 1320
91 agaagctctt ttgaatactt acagaattga ttactattac cgtcacttat actacattcg 1380
92 ttctgcaatc agggctggct caaatgtgaa gggtttttac gcatggtcat ttttggactg 1440
93 taatgaatgg tttgcaggct ttactgttcg ttttggatta aactttgtag attagaaaga 1500
94 tggattaaaa aggtacccta agctttctgc ccaatggtac aagaactttc tcaaaagaaa 1560
95 ctagctagta ttattaaaag aactttgtag tagattacag tacatcgttt gaagttgagt 1620
96 tggtgcacct aattaaataa aagaggttac tcttaacata tttttaggcc attcgttgtg 1680
97 aagttgttag gctgttattt ctattatact atgttgtagt aataagtgca ttgttgtacc 1740
98 agaagctatg atcataacta taggttgatc cttcatgtat cagtttgatg ttgagaatac 1800
99 tttgaattaa aagtcttttt ttattttttt aaaaaaaaaa aaaaaaaaaa aaaaaaaaa 1859
100 ID seq2; SV 1; linear; mRNA; STD; PLN; 1859 BP.
101 XX
102 AC X56734; S46826;
103 XX
104 DT 12-SEP-1991 (Rel. 29, Created)
105 DT 25-NOV-2005 (Rel. 85, Last updated, Version 11)
106 XX
107 DE Trifolium repens mRNA for non-cyanogenic beta-glucosidase
108 XX
109 KW beta-glucosidase.
110 XX
111 OS Trifolium repens (white clover)
112 OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
113 OC Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids;
114 OC fabids; Fabales; Fabaceae; Papilionoideae; Trifolieae; Trifolium.
115 XX
116 RN [5]
117 RP 1-1859
118 RX DOI; 10.1007/BF00039495.
119 RX PUBMED; 1907511.
120 RA Oxtoby E., Dunn M.A., Pancoro A., Hughes M.A.;
121 RT "Nucleotide and derived amino acid sequence of the cyanogenic
122 RT beta-glucosidase (linamarase) from white clover (Trifolium repens L.)";
123 RL Plant Mol. Biol. 17(2):209-219(1991).
124 XX
125 RN [6]
126 RP 1-1859
127 RA Hughes M.A.;
128 RT ;
129 RL Submitted (19-NOV-1990) to the INSDC.
130 RL Hughes M.A., University of Newcastle Upon Tyne, Medical School, Newcastle
131 RL Upon Tyne, NE2 4HH, UK
132 XX
133 DR EuropePMC; PMC99098; 11752244.
134 XX
135 FH Key Location/Qualifiers
136 FH
137 FT source 1..1859
138 FT /organism="Trifolium repens"
139 FT /mol_type="mRNA"
140 FT /clone_lib="lambda gt10"
141 FT /clone="TRE361"
142 FT /tissue_type="leaves"
143 FT /db_xref="taxon:3899"
144 FT mRNA 1..1859
145 FT /experiment="experimental evidence, no additional details
146 FT recorded"
147 FT CDS 14..1495
148 FT /product="beta-glucosidase"
149 FT /EC_number="3.2.1.21"
150 FT /note="non-cyanogenic"
151 FT /db_xref="GOA:P26204"
152 FT /db_xref="InterPro:IPR001360"
153 FT /db_xref="InterPro:IPR013781"
154 FT /db_xref="InterPro:IPR017853"
155 FT /db_xref="InterPro:IPR018120"
156 FT /db_xref="UniProtKB/Swiss-Prot:P26204"
157 FT /protein_id="CAA40058.1"
158 FT /translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSRSSFPRGFI
159 FT FGAGSSAYQFEGAVNEGGRGPSIWDTFTHKYPEKIRDGSNADITVDQYHRYKEDVGIMK
160 FT DQNMDSYRFSISWPRILPKGKLSGGINHEGIKYYNNLINELLANGIQPFVTLFHWDLPQ
161 FT VLEDEYGGFLNSGVINDFRDYTDLCFKEFGDRVRYWSTLNEPWVFSNSGYALGTNAPGR
162 FT CSASNVAKPGDSGTGPYIVTHNQILAHAEAVHVYKTKYQAYQKGKIGITLVSNWLMPLD
163 FT DNSIPDIKAAERSLDFQFGLFMEQLTTGDYSKSMRRIVKNRLPKFSKFESSLVNGSFDF
164 FT IGINYYSSSYISNAPSHGNAKPSYSTNPMTNISFEKHGIPLGPRAASIWIYVYPYMFIQ
165 FT EDFEIFCYILKINITILQFSITENGMNEFNDATLPVEEALLNTYRIDYYYRHLYYIRSA
166 FT IRAGSNVKGFYAWSFLDCNEWFAGFTVRFGLNFVD"
167 XX
168 SQ Sequence 1859 BP; 609 A; 314 C; 355 G; 581 T; 0 other;
169 aaacaaacca aatatggatt ttattgtagc catatttgct ctgtttgtta ttagctcatt 60
170 cacaattact tccacaaatg cagttgaagc ttctactctt cttgacatag gtaacctgag 120
171 tcggagcagt tttcctcgtg gcttcatctt tggtgctgga tcttcagcat accaatttga 180
172 aggtgcagta aacgaaggcg gtagaggacc aagtatttgg gataccttca cccataaata 240
173 tccagaaaaa ataagggatg gaagcaatgc agacatcacg gttgaccaat atcaccgcta 300
174 caaggaagat gttgggatta tgaaggatca aaatatggat tcgtatagat tctcaatctc 360
175 ttggccaaga atactcccaa agggaaagtt gagcggaggc ataaatcacg aaggaatcaa 420
176 atattacaac aaccttatca acgaactatt ggctaacggt atacaaccat ttgtaactct 480
177 ttttcattgg gatcttcccc aagtcttaga agatgagtat ggtggtttct taaactccgg 540
178 tgtaataaat gattttcgag actatacgga tctttgcttc aaggaatttg gagatagagt 600
179 gaggtattgg agtactctaa atgagccatg ggtgtttagc aattctggat atgcactagg 660
180 aacaaatgca ccaggtcgat gttcggcctc caacgtggcc aagcctggtg attctggaac 720
181 aggaccttat atagttacac acaatcaaat tcttgctcat gcagaagctg tacatgtgta 780
182 taagactaaa taccaggcat atcaaaaggg aaagataggc ataacgttgg tatctaactg 840
183 gttaatgcca cttgatgata atagcatacc agatataaag gctgccgaga gatcacttga 900
184 cttccaattt ggattgttta tggaacaatt aacaacagga gattattcta agagcatgcg 960
185 gcgtatagtt aaaaaccgat tacctaagtt ctcaaaattc gaatcaagcc tagtgaatgg 1020
186 ttcatttgat tttattggta taaactatta ctcttctagt tatattagca atgccccttc 1080
187 acatggcaat gccaaaccca gttactcaac aaatcctatg accaatattt catttgaaaa 1140
188 acatgggata cccttaggtc caagggctgc ttcaatttgg atatatgttt atccatatat 1200
189 gtttatccaa gaggacttcg agatcttttg ttacatatta aaaataaata taacaatcct 1260
190 gcaattttca atcactgaaa atggtatgaa tgaattcaac gatgcaacac ttccagtaga 1320
191 agaagctctt ttgaatactt acagaattga ttactattac cgtcacttat actacattcg 1380
192 ttctgcaatc agggctggct caaatgtgaa gggtttttac gcatggtcat ttttggactg 1440
193 taatgaatgg tttgcaggct ttactgttcg ttttggatta aactttgtag attagaaaga 1500
194 tggattaaaa aggtacccta agctttctgc ccaatggtac aagaactttc tcaaaagaaa 1560
195 ctagctagta ttattaaaag aactttgtag tagattacag tacatcgttt gaagttgagt 1620
196 tggtgcacct aattaaataa aagaggttac tcttaacata tttttaggcc attcgttgtg 1680
197 aagttgttag gctgttattt ctattatact atgttgtagt aataagtgca ttgttgtacc 1740
198 agaagctatg atcataacta taggttgatc cttcatgtat cagtttgatg ttgagaatac 1800
199 tttgaattaa aagtcttttt ttattttttt aaaaaaaaaa aaaaaaaaaa ccccccccc 1859
200 //
201
+0
-64
pyfastaq/tests/data/sequences_test.embl.to_fasta less more
0 >seq1
1 aaacaaaccaaatatggattttattgtagccatatttgctctgtttgttattagctcatt
2 cacaattacttccacaaatgcagttgaagcttctactcttcttgacataggtaacctgag
3 tcggagcagttttcctcgtggcttcatctttggtgctggatcttcagcataccaatttga
4 aggtgcagtaaacgaaggcggtagaggaccaagtatttgggataccttcacccataaata
5 tccagaaaaaataagggatggaagcaatgcagacatcacggttgaccaatatcaccgcta
6 caaggaagatgttgggattatgaaggatcaaaatatggattcgtatagattctcaatctc
7 ttggccaagaatactcccaaagggaaagttgagcggaggcataaatcacgaaggaatcaa
8 atattacaacaaccttatcaacgaactattggctaacggtatacaaccatttgtaactct
9 ttttcattgggatcttccccaagtcttagaagatgagtatggtggtttcttaaactccgg
10 tgtaataaatgattttcgagactatacggatctttgcttcaaggaatttggagatagagt
11 gaggtattggagtactctaaatgagccatgggtgtttagcaattctggatatgcactagg
12 aacaaatgcaccaggtcgatgttcggcctccaacgtggccaagcctggtgattctggaac
13 aggaccttatatagttacacacaatcaaattcttgctcatgcagaagctgtacatgtgta
14 taagactaaataccaggcatatcaaaagggaaagataggcataacgttggtatctaactg
15 gttaatgccacttgatgataatagcataccagatataaaggctgccgagagatcacttga
16 cttccaatttggattgtttatggaacaattaacaacaggagattattctaagagcatgcg
17 gcgtatagttaaaaaccgattacctaagttctcaaaattcgaatcaagcctagtgaatgg
18 ttcatttgattttattggtataaactattactcttctagttatattagcaatgccccttc
19 acatggcaatgccaaacccagttactcaacaaatcctatgaccaatatttcatttgaaaa
20 acatgggatacccttaggtccaagggctgcttcaatttggatatatgtttatccatatat
21 gtttatccaagaggacttcgagatcttttgttacatattaaaaataaatataacaatcct
22 gcaattttcaatcactgaaaatggtatgaatgaattcaacgatgcaacacttccagtaga
23 agaagctcttttgaatacttacagaattgattactattaccgtcacttatactacattcg
24 ttctgcaatcagggctggctcaaatgtgaagggtttttacgcatggtcatttttggactg
25 taatgaatggtttgcaggctttactgttcgttttggattaaactttgtagattagaaaga
26 tggattaaaaaggtaccctaagctttctgcccaatggtacaagaactttctcaaaagaaa
27 ctagctagtattattaaaagaactttgtagtagattacagtacatcgtttgaagttgagt
28 tggtgcacctaattaaataaaagaggttactcttaacatatttttaggccattcgttgtg
29 aagttgttaggctgttatttctattatactatgttgtagtaataagtgcattgttgtacc
30 agaagctatgatcataactataggttgatccttcatgtatcagtttgatgttgagaatac
31 tttgaattaaaagtctttttttatttttttaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
32 >seq2
33 aaacaaaccaaatatggattttattgtagccatatttgctctgtttgttattagctcatt
34 cacaattacttccacaaatgcagttgaagcttctactcttcttgacataggtaacctgag
35 tcggagcagttttcctcgtggcttcatctttggtgctggatcttcagcataccaatttga
36 aggtgcagtaaacgaaggcggtagaggaccaagtatttgggataccttcacccataaata
37 tccagaaaaaataagggatggaagcaatgcagacatcacggttgaccaatatcaccgcta
38 caaggaagatgttgggattatgaaggatcaaaatatggattcgtatagattctcaatctc
39 ttggccaagaatactcccaaagggaaagttgagcggaggcataaatcacgaaggaatcaa
40 atattacaacaaccttatcaacgaactattggctaacggtatacaaccatttgtaactct
41 ttttcattgggatcttccccaagtcttagaagatgagtatggtggtttcttaaactccgg
42 tgtaataaatgattttcgagactatacggatctttgcttcaaggaatttggagatagagt
43 gaggtattggagtactctaaatgagccatgggtgtttagcaattctggatatgcactagg
44 aacaaatgcaccaggtcgatgttcggcctccaacgtggccaagcctggtgattctggaac
45 aggaccttatatagttacacacaatcaaattcttgctcatgcagaagctgtacatgtgta
46 taagactaaataccaggcatatcaaaagggaaagataggcataacgttggtatctaactg
47 gttaatgccacttgatgataatagcataccagatataaaggctgccgagagatcacttga
48 cttccaatttggattgtttatggaacaattaacaacaggagattattctaagagcatgcg
49 gcgtatagttaaaaaccgattacctaagttctcaaaattcgaatcaagcctagtgaatgg
50 ttcatttgattttattggtataaactattactcttctagttatattagcaatgccccttc
51 acatggcaatgccaaacccagttactcaacaaatcctatgaccaatatttcatttgaaaa
52 acatgggatacccttaggtccaagggctgcttcaatttggatatatgtttatccatatat
53 gtttatccaagaggacttcgagatcttttgttacatattaaaaataaatataacaatcct
54 gcaattttcaatcactgaaaatggtatgaatgaattcaacgatgcaacacttccagtaga
55 agaagctcttttgaatacttacagaattgattactattaccgtcacttatactacattcg
56 ttctgcaatcagggctggctcaaatgtgaagggtttttacgcatggtcatttttggactg
57 taatgaatggtttgcaggctttactgttcgttttggattaaactttgtagattagaaaga
58 tggattaaaaaggtaccctaagctttctgcccaatggtacaagaactttctcaaaagaaa
59 ctagctagtattattaaaagaactttgtagtagattacagtacatcgtttgaagttgagt
60 tggtgcacctaattaaataaaagaggttactcttaacatatttttaggccattcgttgtg
61 aagttgttaggctgttatttctattatactatgttgtagtaataagtgcattgttgtacc
62 agaagctatgatcataactataggttgatccttcatgtatcagtttgatgttgagaatac
63 tttgaattaaaagtctttttttatttttttaaaaaaaaaaaaaaaaaaaaccccccccc
+0
-19
pyfastaq/tests/data/sequences_test.fa less more
0 >1
1 ACGTA
2 >2
3 A
4
5 C
6 GT
7
8 A
9
10 >3
11
12
13 ACGTA
14 >4
15 ACGTA
16
17
18
+0
-4
pyfastaq/tests/data/sequences_test.fa.ids less more
0 1
1 2
2 3
3 4
+0
-17
pyfastaq/tests/data/sequences_test.fa.qual less more
0 >1
1 40 40 40
2 40 40
3
4 >2
5 40
6 40
7
8 40
9 40 40
10 >3
11
12 40 40 40 40 40
13
14 >4
15 40 40 40 40 40
16
+0
-17
pyfastaq/tests/data/sequences_test.fa.qual.bad less more
0 >1
1 40 40 40
2 40 40
3
4 >3
5 40
6 40
7
8 40
9 40 40
10 >3
11
12 40 40 40 40 40
13
14 >4
15 40 40 40 40 40
16
+0
-16
pyfastaq/tests/data/sequences_test.fasta_to_fastq.fq less more
0 @1
1 ACGTA
2 +
3 IIIII
4 @2
5 ACGTA
6 +
7 IIIII
8 @3
9 ACGTA
10 +
11 IIIII
12 @4
13 ACGTA
14 +
15 IIIII
+0
-170
pyfastaq/tests/data/sequences_test.gbk less more
0 LOCUS NAME1 5028 bp DNA PLN 21-JUN-1999
1 DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p
2 (AXL2) and Rev7p (REV7) genes, complete cds.
3 ACCESSION U49845
4 VERSION U49845.1 GI:1293613
5 KEYWORDS .
6 SOURCE Saccharomyces cerevisiae (baker's yeast)
7 ORGANISM Saccharomyces cerevisiae
8 Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes;
9 Saccharomycetales; Saccharomycetaceae; Saccharomyces.
10 REFERENCE 1 (bases 1 to 5028)
11 AUTHORS Torpey,L.E., Gibbs,P.E., Nelson,J. and Lawrence,C.W.
12 TITLE Cloning and sequence of REV7, a gene whose function is required for
13 DNA damage-induced mutagenesis in Saccharomyces cerevisiae
14 JOURNAL Yeast 10 (11), 1503-1509 (1994)
15 PUBMED 7871890
16 REFERENCE 2 (bases 1 to 5028)
17 AUTHORS Roemer,T., Madden,K., Chang,J. and Snyder,M.
18 TITLE Selection of axial growth sites in yeast requires Axl2p, a novel
19 plasma membrane glycoprotein
20 JOURNAL Genes Dev. 10 (7), 777-793 (1996)
21 PUBMED 8846915
22 REFERENCE 3 (bases 1 to 5028)
23 AUTHORS Roemer,T.
24 TITLE Direct Submission
25 JOURNAL Submitted (22-FEB-1996) Terry Roemer, Biology, Yale University, New
26 Haven, CT, USA
27 FEATURES Location/Qualifiers
28 source 1..5028
29 /organism="Saccharomyces cerevisiae"
30 /db_xref="taxon:4932"
31 /chromosome="IX"
32 /map="9"
33 CDS <1..206
34 /codon_start=3
35 /product="TCP1-beta"
36 /protein_id="AAA98665.1"
37 /db_xref="GI:1293614"
38 /translation="SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA
39 AEVLLRVDNIIRARPRTANRQHM"
40 gene 687..3158
41 /gene="AXL2"
42 CDS 687..3158
43 /gene="AXL2"
44 /note="plasma membrane glycoprotein"
45 /codon_start=1
46 /function="required for axial budding pattern of S.
47 cerevisiae"
48 /product="Axl2p"
49 /protein_id="AAA98666.1"
50 /db_xref="GI:1293615"
51 /translation="MTQLQISLLLTATISLLHLVVATPYEAYPIGKQYPPVARVNESF
52 TFQISNDTYKSSVDKTAQITYNCFDLPSWLSFDSSSRTFSGEPSSDLLSDANTTLYFN
53 VILEGTDSADSTSLNNTYQFVVTNRPSISLSSDFNLLALLKNYGYTNGKNALKLDPNE
54 VFNVTFDRSMFTNEESIVSYYGRSQLYNAPLPNWLFFDSGELKFTGTAPVINSAIAPE
55 TSYSFVIIATDIEGFSAVEVEFELVIGAHQLTTSIQNSLIINVTDTGNVSYDLPLNYV
56 YLDDDPISSDKLGSINLLDAPDWVALDNATISGSVPDELLGKNSNPANFSVSIYDTYG
57 DVIYFNFEVVSTTDLFAISSLPNINATRGEWFSYYFLPSQFTDYVNTNVSLEFTNSSQ
58 DHDWVKFQSSNLTLAGEVPKNFDKLSLGLKANQGSQSQELYFNIIGMDSKITHSNHSA
59 NATSTRSSHHSTSTSSYTSSTYTAKISSTSAAATSSAPAALPAANKTSSHNKKAVAIA
60 CGVAIPLGVILVALICFLIFWRRRRENPDDENLPHAISGPDLNNPANKPNQENATPLN
61 NPFDDDASSYDDTSIARRLAALNTLKLDNHSATESDISSVDEKRDSLSGMNTYNDQFQ
62 SQSKEELLAKPPVQPPESPFFDPQNRSSSVYMDSEPAVNKSWRYTGNLSPVSDIVRDS
63 YGSQKTVDTEKLFDLEAPEKEKRTSRDVTMSSLDPWNSNISPSPVRKSVTPSPYNVTK
64 HRNRHLQNIQDSQSGKNGITPTTMSTSSSDDFVPVKDGENFCWVHSMEPDRRPSKKRL
65 VDFSNKSNVNVGQVKDIHGRIPEML"
66 gene complement(3300..4037)
67 /gene="REV7"
68 CDS complement(3300..4037)
69 /gene="REV7"
70 /codon_start=1
71 /product="Rev7p"
72 /protein_id="AAA98667.1"
73 /db_xref="GI:1293616"
74 /translation="MNRWVEKWLRVYLKCYINLILFYRNVYPPQSFDYTTYQSFNLPQ
75 FVPINRHPALIDYIEELILDVLSKLTHVYRFSICIINKKNDLCIEKYVLDFSELQHVD
76 KDDQIITETEVFDEFRSSLNSLIMHLEKLPKVNDDTITFEAVINAIELELGHKLDRNR
77 RVDSLEEKAEIERDSNWVKCQEDENLPDNNGFQPPKIKLTSLVGSDVGPLIIHQFSEK
78 LISGDDKILNGVYSQYEEGESIFGSLF"
79 ORIGIN
80 1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg
81 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct
82 121 ctgcatctga agccgctgaa gttctactaa gggtggataa catcatccgt gcaagaccaa
83 181 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc
84 //
85 LOCUS NAME2 5028 bp DNA PLN 21-JUN-1999
86 DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p
87 (AXL2) and Rev7p (REV7) genes, complete cds.
88 ACCESSION U49845
89 VERSION U49845.1 GI:1293613
90 KEYWORDS .
91 SOURCE Saccharomyces cerevisiae (baker's yeast)
92 ORGANISM Saccharomyces cerevisiae
93 Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes;
94 Saccharomycetales; Saccharomycetaceae; Saccharomyces.
95 REFERENCE 1 (bases 1 to 5028)
96 AUTHORS Torpey,L.E., Gibbs,P.E., Nelson,J. and Lawrence,C.W.
97 TITLE Cloning and sequence of REV7, a gene whose function is required for
98 DNA damage-induced mutagenesis in Saccharomyces cerevisiae
99 JOURNAL Yeast 10 (11), 1503-1509 (1994)
100 PUBMED 7871890
101 REFERENCE 2 (bases 1 to 5028)
102 AUTHORS Roemer,T., Madden,K., Chang,J. and Snyder,M.
103 TITLE Selection of axial growth sites in yeast requires Axl2p, a novel
104 plasma membrane glycoprotein
105 JOURNAL Genes Dev. 10 (7), 777-793 (1996)
106 PUBMED 8846915
107 REFERENCE 3 (bases 1 to 5028)
108 AUTHORS Roemer,T.
109 TITLE Direct Submission
110 JOURNAL Submitted (22-FEB-1996) Terry Roemer, Biology, Yale University, New
111 Haven, CT, USA
112 FEATURES Location/Qualifiers
113 source 1..5028
114 /organism="Saccharomyces cerevisiae"
115 /db_xref="taxon:4932"
116 /chromosome="IX"
117 /map="9"
118 CDS <1..206
119 /codon_start=3
120 /product="TCP1-beta"
121 /protein_id="AAA98665.1"
122 /db_xref="GI:1293614"
123 /translation="SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA
124 AEVLLRVDNIIRARPRTANRQHM"
125 gene 687..3158
126 /gene="AXL2"
127 CDS 687..3158
128 /gene="AXL2"
129 /note="plasma membrane glycoprotein"
130 /codon_start=1
131 /function="required for axial budding pattern of S.
132 cerevisiae"
133 /product="Axl2p"
134 /protein_id="AAA98666.1"
135 /db_xref="GI:1293615"
136 /translation="MTQLQISLLLTATISLLHLVVATPYEAYPIGKQYPPVARVNESF
137 TFQISNDTYKSSVDKTAQITYNCFDLPSWLSFDSSSRTFSGEPSSDLLSDANTTLYFN
138 VILEGTDSADSTSLNNTYQFVVTNRPSISLSSDFNLLALLKNYGYTNGKNALKLDPNE
139 VFNVTFDRSMFTNEESIVSYYGRSQLYNAPLPNWLFFDSGELKFTGTAPVINSAIAPE
140 TSYSFVIIATDIEGFSAVEVEFELVIGAHQLTTSIQNSLIINVTDTGNVSYDLPLNYV
141 YLDDDPISSDKLGSINLLDAPDWVALDNATISGSVPDELLGKNSNPANFSVSIYDTYG
142 DVIYFNFEVVSTTDLFAISSLPNINATRGEWFSYYFLPSQFTDYVNTNVSLEFTNSSQ
143 DHDWVKFQSSNLTLAGEVPKNFDKLSLGLKANQGSQSQELYFNIIGMDSKITHSNHSA
144 NATSTRSSHHSTSTSSYTSSTYTAKISSTSAAATSSAPAALPAANKTSSHNKKAVAIA
145 CGVAIPLGVILVALICFLIFWRRRRENPDDENLPHAISGPDLNNPANKPNQENATPLN
146 NPFDDDASSYDDTSIARRLAALNTLKLDNHSATESDISSVDEKRDSLSGMNTYNDQFQ
147 SQSKEELLAKPPVQPPESPFFDPQNRSSSVYMDSEPAVNKSWRYTGNLSPVSDIVRDS
148 YGSQKTVDTEKLFDLEAPEKEKRTSRDVTMSSLDPWNSNISPSPVRKSVTPSPYNVTK
149 HRNRHLQNIQDSQSGKNGITPTTMSTSSSDDFVPVKDGENFCWVHSMEPDRRPSKKRL
150 VDFSNKSNVNVGQVKDIHGRIPEML"
151 gene complement(3300..4037)
152 /gene="REV7"
153 CDS complement(3300..4037)
154 /gene="REV7"
155 /codon_start=1
156 /product="Rev7p"
157 /protein_id="AAA98667.1"
158 /db_xref="GI:1293616"
159 /translation="MNRWVEKWLRVYLKCYINLILFYRNVYPPQSFDYTTYQSFNLPQ
160 FVPINRHPALIDYIEELILDVLSKLTHVYRFSICIINKKNDLCIEKYVLDFSELQHVD
161 KDDQIITETEVFDEFRSSLNSLIMHLEKLPKVNDDTITFEAVINAIELELGHKLDRNR
162 RVDSLEEKAEIERDSNWVKCQEDENLPDNNGFQPPKIKLTSLVGSDVGPLIIHQFSEK
163 LISGDDKILNGVYSQYEEGESIFGSLF"
164 ORIGIN
165 1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg
166 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct
167 121 ctgcatctga agccgctgaa gttctactaa gggtggataa catcatccgt gcaagaccaa
168 181 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgaaa
169 //
+0
-10
pyfastaq/tests/data/sequences_test.gbk.to_fasta less more
0 >NAME1
1 gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaaccattg
2 ccgacatgagacagttaggtatcgtcgagagttacaagctaaaacgagcagtagtcagct
3 ctgcatctgaagccgctgaagttctactaagggtggataacatcatccgtgcaagaccaa
4 tgccatgactcagattctaattttaagctattcaatttctctttgatc
5 >NAME2
6 gatcctccatatacaacggtatctccacctcaggtttagatctcaacaacggaaccattg
7 ccgacatgagacagttaggtatcgtcgagagttacaagctaaaacgagcagtagtcagct
8 ctgcatctgaagccgctgaagttctactaagggtggataacatcatccgtgcaagaccaa
9 tgccatgactcagattctaattttaagctattcaatttctctttgaaa
+0
-12
pyfastaq/tests/data/sequences_test.line_length3.fa less more
0 >1
1 ACG
2 TA
3 >2
4 ACG
5 TA
6 >3
7 ACG
8 TA
9 >4
10 ACG
11 TA
+0
-6
pyfastaq/tests/data/sequences_test.to_fasta.strip_after_whitespace_non_unique.in.fa less more
0 >1 spam
1 ACGT
2 >1 eggs
3 A
4 >2
5 GTTTG
+0
-6
pyfastaq/tests/data/sequences_test.to_fasta.strip_after_whitespace_non_unique.out.fa less more
0 >1
1 ACGT
2 >1
3 A
4 >2
5 GTTTG
+0
-6
pyfastaq/tests/data/sequences_test.to_fasta.strip_after_whitespace_unique.in.fa less more
0 >1 abcde
1 ACGT
2 >2 abcde
3 G
4 >3 hello
5 GTACCA
+0
-6
pyfastaq/tests/data/sequences_test.to_fasta.strip_after_whitespace_unique.out.fa less more
0 >1
1 ACGT
2 >2
3 G
4 >3
5 GTACCA
+0
-4
pyfastaq/tests/data/test_acgtn_only.expected.fa less more
0 >seq1
1 acgtACGTnN
2 >seq2
3 aNcNgNNT
+0
-4
pyfastaq/tests/data/test_acgtn_only.in.fa less more
0 >seq1
1 acgtACGTnN
2 >seq2
3 aXcRg.?T
22 import sys
33 import filecmp
44 import os
5 import tempfile
56 import unittest
67 from pyfastaq import tasks, sequences
78
171172 self.assertTrue(filecmp.cmp(correct_files[i], outfile))
172173 os.unlink(outfile)
173174
175 def test_regex_check_comments_filter(self):
176 '''When check_comments is true, and the regex is in the comment'''
177 infile = tempfile.NamedTemporaryFile(suffix=".fa", mode="w+")
178 infile.write(
179 ">read1 foo=bar\nAGCT\n>read2 bar=foo\nGGG\n>read3\nGGGG\n>read4 foo=ba\n"
180 "GCA\n>read5foo=bar\nGCAT"
181 )
182 infile.seek(0)
183 regex = '\sfoo=bar'
184 outfile = tempfile.NamedTemporaryFile(suffix=".fa", mode="w+")
185
186 tasks.filter(infile.name, outfile.name, regex=regex, check_comments=True)
187 with open(outfile.name) as handle:
188 actual = handle.read()
189
190 expected = ">read1 foo=bar\nAGCT\n"
191
192 self.assertEqual(actual, expected)
193
174194 def test_ids_from_file_filter(self):
175195 '''Test that can extract reads from a file of read names'''
176196 infile = os.path.join(data_dir, 'sequences_test_filter_by_ids_file.fa')
0 Metadata-Version: 2.1
1 Name: pyfastaq
2 Version: 3.17.0
3 Summary: Script to manipulate FASTA and FASTQ files, plus API for developers
4 Home-page: https://github.com/sanger-pathogens/Fastaq
5 Author: Martin Hunt
6 Author-email: path-help@sanger.ac.uk
7 License: GPLv3
8 Platform: UNKNOWN
9 Classifier: Development Status :: 4 - Beta
10 Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
11 Classifier: Programming Language :: Python :: 3 :: Only
12 Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
13 License-File: LICENSE
14 License-File: AUTHORS
15
16 UNKNOWN
17
0 AUTHORS
1 LICENSE
2 MANIFEST.in
3 README.md
4 setup.py
5 pyfastaq/__init__.py
6 pyfastaq/caf.py
7 pyfastaq/genetic_codes.py
8 pyfastaq/intervals.py
9 pyfastaq/sequences.py
10 pyfastaq/tasks.py
11 pyfastaq/utils.py
12 pyfastaq.egg-info/PKG-INFO
13 pyfastaq.egg-info/SOURCES.txt
14 pyfastaq.egg-info/dependency_links.txt
15 pyfastaq.egg-info/top_level.txt
16 pyfastaq/runners/__init__.py
17 pyfastaq/runners/acgtn_only.py
18 pyfastaq/runners/add_indels.py
19 pyfastaq/runners/caf_to_fastq.py
20 pyfastaq/runners/capillary_to_pairs.py
21 pyfastaq/runners/chunker.py
22 pyfastaq/runners/count_sequences.py
23 pyfastaq/runners/deinterleave.py
24 pyfastaq/runners/enumerate_names.py
25 pyfastaq/runners/expand_nucleotides.py
26 pyfastaq/runners/fasta_to_fastq.py
27 pyfastaq/runners/filter.py
28 pyfastaq/runners/get_ids.py
29 pyfastaq/runners/get_seq_flanking_gaps.py
30 pyfastaq/runners/interleave.py
31 pyfastaq/runners/make_random_contigs.py
32 pyfastaq/runners/merge.py
33 pyfastaq/runners/replace_bases.py
34 pyfastaq/runners/reverse_complement.py
35 pyfastaq/runners/scaffolds_to_contigs.py
36 pyfastaq/runners/search_for_seq.py
37 pyfastaq/runners/sequence_trim.py
38 pyfastaq/runners/sort_by_name.py
39 pyfastaq/runners/sort_by_size.py
40 pyfastaq/runners/split_by_base_count.py
41 pyfastaq/runners/strip_illumina_suffix.py
42 pyfastaq/runners/to_boulderio.py
43 pyfastaq/runners/to_fake_qual.py
44 pyfastaq/runners/to_fasta.py
45 pyfastaq/runners/to_mira_xml.py
46 pyfastaq/runners/to_orfs_gff.py
47 pyfastaq/runners/to_perfect_reads.py
48 pyfastaq/runners/to_random_subset.py
49 pyfastaq/runners/to_tiling_bam.py
50 pyfastaq/runners/to_unique_by_id.py
51 pyfastaq/runners/translate.py
52 pyfastaq/runners/trim_Ns_at_end.py
53 pyfastaq/runners/trim_contigs.py
54 pyfastaq/runners/trim_ends.py
55 pyfastaq/runners/version.py
56 pyfastaq/tests/caf_test.py
57 pyfastaq/tests/intervals_test.py
58 pyfastaq/tests/sequences_test.py
59 pyfastaq/tests/tasks_test.py
60 pyfastaq/tests/utils_test.py
61 pyfastaq/tests/data/sequences_test_3-per-line.fa
62 pyfastaq/tests/data/sequences_test_cap_to_read_pairs.fa
63 pyfastaq/tests/data/sequences_test_cap_to_read_pairs.fa.paired.gz
64 pyfastaq/tests/data/sequences_test_cap_to_read_pairs.fa.unpaired.gz
65 pyfastaq/tests/data/sequences_test_deinterleaved_1.fa
66 pyfastaq/tests/data/sequences_test_deinterleaved_2.fa
67 pyfastaq/tests/data/sequences_test_deinterleaved_bad2_1.fa
68 pyfastaq/tests/data/sequences_test_deinterleaved_bad2_2.fa
69 pyfastaq/tests/data/sequences_test_deinterleaved_bad_1.fa
70 pyfastaq/tests/data/sequences_test_deinterleaved_bad_2.fa
71 pyfastaq/tests/data/sequences_test_deinterleaved_no_suffixes_1.fa
72 pyfastaq/tests/data/sequences_test_deinterleaved_no_suffixes_2.fa
73 pyfastaq/tests/data/sequences_test_empty_file
74 pyfastaq/tests/data/sequences_test_enumerate_names.fa
75 pyfastaq/tests/data/sequences_test_enumerate_names.fa.out.add_suffix
76 pyfastaq/tests/data/sequences_test_enumerate_names.fa.out.keep_suffix
77 pyfastaq/tests/data/sequences_test_enumerate_names.fa.out.start.1
78 pyfastaq/tests/data/sequences_test_enumerate_names.fa.out.start.1.rename_file
79 pyfastaq/tests/data/sequences_test_enumerate_names.fa.out.start.2
80 pyfastaq/tests/data/sequences_test_fai_test.fa
81 pyfastaq/tests/data/sequences_test_fai_test.fa.fai
82 pyfastaq/tests/data/sequences_test_fail_no_AT.fq
83 pyfastaq/tests/data/sequences_test_fail_no_plus.fq
84 pyfastaq/tests/data/sequences_test_fail_no_qual.fq
85 pyfastaq/tests/data/sequences_test_fail_no_seq.fq
86 pyfastaq/tests/data/sequences_test_fastaq_replace_bases.expected.fa
87 pyfastaq/tests/data/sequences_test_fastaq_replace_bases.fa
88 pyfastaq/tests/data/sequences_test_filter_by_ids_file.fa
89 pyfastaq/tests/data/sequences_test_filter_by_ids_file.fa.filtered
90 pyfastaq/tests/data/sequences_test_filter_by_ids_file.fa.filtered.invert
91 pyfastaq/tests/data/sequences_test_filter_by_ids_file.fa.ids
92 pyfastaq/tests/data/sequences_test_filter_by_regex.fa
93 pyfastaq/tests/data/sequences_test_filter_by_regex.first-char-a.fa
94 pyfastaq/tests/data/sequences_test_filter_by_regex.first-of-pair.fa
95 pyfastaq/tests/data/sequences_test_filter_by_regex.numeric.fa
96 pyfastaq/tests/data/sequences_test_get_seqs_flanking_gaps.fa
97 pyfastaq/tests/data/sequences_test_get_seqs_flanking_gaps.fa.out
98 pyfastaq/tests/data/sequences_test_gffv3.gff
99 pyfastaq/tests/data/sequences_test_gffv3.gff.fasta
100 pyfastaq/tests/data/sequences_test_gffv3.gff.to_fasta
101 pyfastaq/tests/data/sequences_test_gffv3.no_FASTA_line.gff
102 pyfastaq/tests/data/sequences_test_gffv3.no_FASTA_line.gff.to_fasta
103 pyfastaq/tests/data/sequences_test_gffv3.no_seq.2.gff
104 pyfastaq/tests/data/sequences_test_gffv3.no_seq.gff
105 pyfastaq/tests/data/sequences_test_good_file.fq
106 pyfastaq/tests/data/sequences_test_good_file.fq.to_fasta
107 pyfastaq/tests/data/sequences_test_good_file_mira.xml
108 pyfastaq/tests/data/sequences_test_interleaved.fa
109 pyfastaq/tests/data/sequences_test_interleaved.fq
110 pyfastaq/tests/data/sequences_test_interleaved_bad.fa
111 pyfastaq/tests/data/sequences_test_interleaved_with_suffixes.fa
112 pyfastaq/tests/data/sequences_test_length_filter.fa
113 pyfastaq/tests/data/sequences_test_length_filter.min-0.max-1.fa
114 pyfastaq/tests/data/sequences_test_length_filter.min-0.max-inf.fa
115 pyfastaq/tests/data/sequences_test_length_filter.min-4.max-4.fa
116 pyfastaq/tests/data/sequences_test_make_random_contigs.default.fa
117 pyfastaq/tests/data/sequences_test_make_random_contigs.first-42.fa
118 pyfastaq/tests/data/sequences_test_make_random_contigs.name-by-letters.fa
119 pyfastaq/tests/data/sequences_test_make_random_contigs.prefix-p.fa
120 pyfastaq/tests/data/sequences_test_merge_to_one_seq.fa
121 pyfastaq/tests/data/sequences_test_merge_to_one_seq.fq
122 pyfastaq/tests/data/sequences_test_merge_to_one_seq.merged.fa
123 pyfastaq/tests/data/sequences_test_merge_to_one_seq.merged.fq
124 pyfastaq/tests/data/sequences_test_not_a_fastaq_file
125 pyfastaq/tests/data/sequences_test_one-per-line.fa
126 pyfastaq/tests/data/sequences_test_orfs.fa
127 pyfastaq/tests/data/sequences_test_orfs.gff
128 pyfastaq/tests/data/sequences_test_phylip.interleaved
129 pyfastaq/tests/data/sequences_test_phylip.interleaved.to_fasta
130 pyfastaq/tests/data/sequences_test_phylip.interleaved2
131 pyfastaq/tests/data/sequences_test_phylip.interleaved2.to_fasta
132 pyfastaq/tests/data/sequences_test_phylip.made_by_seaview
133 pyfastaq/tests/data/sequences_test_phylip.made_by_seaview.to_fasta
134 pyfastaq/tests/data/sequences_test_phylip.sequential
135 pyfastaq/tests/data/sequences_test_phylip.sequential.to_fasta
136 pyfastaq/tests/data/sequences_test_revcomp.fa
137 pyfastaq/tests/data/sequences_test_search_string.fa
138 pyfastaq/tests/data/sequences_test_search_string.fa.hits
139 pyfastaq/tests/data/sequences_test_split_fixed_size.fa
140 pyfastaq/tests/data/sequences_test_split_fixed_size.fa.split.1
141 pyfastaq/tests/data/sequences_test_split_fixed_size.fa.split.2
142 pyfastaq/tests/data/sequences_test_split_fixed_size.fa.split.3
143 pyfastaq/tests/data/sequences_test_split_fixed_size.fa.split.4
144 pyfastaq/tests/data/sequences_test_split_fixed_size.fa.split.5
145 pyfastaq/tests/data/sequences_test_split_fixed_size.fa.split.6
146 pyfastaq/tests/data/sequences_test_split_fixed_size.fa.split.coords
147 pyfastaq/tests/data/sequences_test_split_fixed_size.fa.split.skip_if_all_Ns.1
148 pyfastaq/tests/data/sequences_test_split_fixed_size.fa.split.skip_if_all_Ns.2
149 pyfastaq/tests/data/sequences_test_split_fixed_size.fa.split.skip_if_all_Ns.3
150 pyfastaq/tests/data/sequences_test_split_fixed_size.fa.split.skip_if_all_Ns.4
151 pyfastaq/tests/data/sequences_test_split_fixed_size.fa.split.skip_if_all_Ns.coords
152 pyfastaq/tests/data/sequences_test_split_fixed_size_onefile.fa
153 pyfastaq/tests/data/sequences_test_split_fixed_size_onefile.out.fa
154 pyfastaq/tests/data/sequences_test_split_fixed_size_onefile.skip_Ns.out.fa
155 pyfastaq/tests/data/sequences_test_split_test.fa
156 pyfastaq/tests/data/sequences_test_split_test.fa.2.1
157 pyfastaq/tests/data/sequences_test_split_test.fa.2.2
158 pyfastaq/tests/data/sequences_test_split_test.fa.2.3
159 pyfastaq/tests/data/sequences_test_split_test.fa.2.4
160 pyfastaq/tests/data/sequences_test_split_test.fa.3.1
161 pyfastaq/tests/data/sequences_test_split_test.fa.3.2
162 pyfastaq/tests/data/sequences_test_split_test.fa.3.3
163 pyfastaq/tests/data/sequences_test_split_test.fa.4.1
164 pyfastaq/tests/data/sequences_test_split_test.fa.4.2
165 pyfastaq/tests/data/sequences_test_split_test.fa.4.3
166 pyfastaq/tests/data/sequences_test_split_test.fa.6.1
167 pyfastaq/tests/data/sequences_test_split_test.fa.6.2
168 pyfastaq/tests/data/sequences_test_split_test.fa.6.limit2.1
169 pyfastaq/tests/data/sequences_test_split_test.fa.6.limit2.2
170 pyfastaq/tests/data/sequences_test_split_test.fa.6.limit2.3
171 pyfastaq/tests/data/sequences_test_split_test.long.fa
172 pyfastaq/tests/data/sequences_test_split_test.long.fa.2.1
173 pyfastaq/tests/data/sequences_test_split_test.long.fa.2.2
174 pyfastaq/tests/data/sequences_test_strip_after_whitespace.fa
175 pyfastaq/tests/data/sequences_test_strip_after_whitespace.fa.to_fasta
176 pyfastaq/tests/data/sequences_test_strip_illumina_suffix.fq
177 pyfastaq/tests/data/sequences_test_strip_illumina_suffix.fq.stripped
178 pyfastaq/tests/data/sequences_test_to_fasta_union.in.fa
179 pyfastaq/tests/data/sequences_test_to_fasta_union.out.fa
180 pyfastaq/tests/data/sequences_test_to_unique_by_id.fa
181 pyfastaq/tests/data/sequences_test_to_unique_by_id.fa.out
182 pyfastaq/tests/data/sequences_test_translate.fa
183 pyfastaq/tests/data/sequences_test_translate.fa.frame0
184 pyfastaq/tests/data/sequences_test_translate.fa.frame1
185 pyfastaq/tests/data/sequences_test_translate.fa.frame2
186 pyfastaq/tests/data/sequences_test_trim_Ns_at_end.fa
187 pyfastaq/tests/data/sequences_test_trim_Ns_at_end.fa.trimmed
188 pyfastaq/tests/data/sequences_test_trim_contigs.fa
189 pyfastaq/tests/data/sequences_test_trim_contigs.fa.out
190 pyfastaq/tests/data/sequences_test_trimmed.fq
191 pyfastaq/tests/data/sequences_test_untrimmed.fq
192 pyfastaq/tests/data/tasks_test_expend_nucleotides.in.fa
193 pyfastaq/tests/data/tasks_test_expend_nucleotides.in.fq
194 pyfastaq/tests/data/tasks_test_expend_nucleotides.out.fa
195 pyfastaq/tests/data/tasks_test_expend_nucleotides.out.fq
196 pyfastaq/tests/data/tasks_test_fasta_to_fake_qual.in.fa
197 pyfastaq/tests/data/tasks_test_fasta_to_fake_qual.out.default.qual
198 pyfastaq/tests/data/tasks_test_fasta_to_fake_qual.out.q42.qual
199 pyfastaq/tests/data/tasks_test_filter_paired_both_pass.in_1.fa
200 pyfastaq/tests/data/tasks_test_filter_paired_both_pass.in_2.fa
201 pyfastaq/tests/data/tasks_test_filter_paired_both_pass.out_1.fa
202 pyfastaq/tests/data/tasks_test_filter_paired_both_pass.out_2.fa
203 pyfastaq/tests/data/tasks_test_filter_paired_one_pass.in_1.fa
204 pyfastaq/tests/data/tasks_test_filter_paired_one_pass.in_2.fa
205 pyfastaq/tests/data/tasks_test_filter_paired_one_pass.out_1.fa
206 pyfastaq/tests/data/tasks_test_filter_paired_one_pass.out_2.fa
207 pyfastaq/tests/data/tasks_test_length_offsets_from_fai.fa
208 pyfastaq/tests/data/tasks_test_length_offsets_from_fai.fa.fai
209 pyfastaq/tests/data/tasks_test_make_long_reads.input.fa
210 pyfastaq/tests/data/tasks_test_make_long_reads.output.fa
211 pyfastaq/tests/data/tasks_test_mean_length.fa
212 pyfastaq/tests/data/tasks_test_sequence_trim_1.fa
213 pyfastaq/tests/data/tasks_test_sequence_trim_1.trimmed.fa
214 pyfastaq/tests/data/tasks_test_sequence_trim_2.fa
215 pyfastaq/tests/data/tasks_test_sequence_trim_2.trimmed.fa
216 pyfastaq/tests/data/tasks_test_sequences_to_trim.fa
217 pyfastaq/tests/data/tasks_test_sort_by_name.in.fa
218 pyfastaq/tests/data/tasks_test_sort_by_name.out.fa
219 pyfastaq/tests/data/tasks_test_sort_by_size.in.fa
220 pyfastaq/tests/data/tasks_test_sort_by_size.out.fa
221 pyfastaq/tests/data/tasks_test_sort_by_size.out.rev.fa
222 pyfastaq/tests/data/tasks_test_stats_from_fai.in.empty.fai
223 pyfastaq/tests/data/tasks_test_stats_from_fai.in.fai
224 pyfastaq/tests/data/tasks_test_to_boulderio.in.fa
225 pyfastaq/tests/data/tasks_test_to_boulderio.out.boulder
226 pyfastaq/tests/data/tasks_test_to_fastg.fasta
227 pyfastaq/tests/data/tasks_test_to_fastg.fastg
228 pyfastaq/tests/data/tasks_test_to_fastg.ids_to_circularise
229 pyfastaq/tests/data/utils_test_file_transpose.txt
230 pyfastaq/tests/data/utils_test_file_transposed.txt
231 pyfastaq/tests/data/utils_test_not_really_zipped.gz
232 pyfastaq/tests/data/utils_test_scaffolds.fa
233 pyfastaq/tests/data/utils_test_scaffolds.fa.to_contigs.fa
234 pyfastaq/tests/data/utils_test_scaffolds.fa.to_contigs.number_contigs.fa
235 pyfastaq/tests/data/utils_test_system_call.txt
236 scripts/fastaq
0 [egg_info]
1 tag_build =
2 tag_date = 0
3