Codebase list gfapy / 7b933c9
Update upstream source from tag 'upstream/1.1.0+dfsg' Update to upstream version '1.1.0+dfsg' with Debian dir f4e65cb2228aa295ac06db0176433ee8e0c066e0 Sascha Steinbiss 3 years ago
33 changed file(s) with 452 addition(s) and 176 deletion(s). Raw diff Collapse all Expand all
0 == 1.1.0 ==
1
2 - fix: custom tags are not necessarily lower case
3 - additional support for rGFA subset of GFA1 by setting option dialect="rgfa"
4
05 == 1.0.0 ==
16
27 - initial release
2222 mkdir -p manual
2323 cp doc/_build/latex/Gfapy.pdf manual/gfapy-manual.pdf
2424
25 doctest:
26 cd doc && make doctest
2527
26 # Run unit tests
27 tests:
28 cd doc && make doctest
28 unittests:
2929 @echo
3030 @echo "Running unit test suite..."
3131 @PYTHONHASHSEED=0 ${PYTHON} -m unittest discover
32
33 tests: doctest unittests
3234
3335 # Remove distribution files
3436 cleanup:
5757 Documentation
5858 ~~~~~~~~~~~~~
5959
60 The documentation, including this introduction to Gfapy, an user manual
60 The documentation, including this introduction to Gfapy, a user manual
6161 and the API documentation is hosted on the ReadTheDocs server,
6262 at the URL http://gfapy.readthedocs.io/en/latest/ and it can be
6363 downloaded as PDF from the URL
0 #!/usr/bin/env python3
1 """
2 Renumber the segments of a GFA assembly graph.
3 The largest segment is renamed 01, down to the smallest segment 99.
4 The amount of zero-padding required is determined automatically.
5 """
6
7 from gfapy import Gfa
8 import argparse
9 import math
10
11 argparser = argparse.ArgumentParser(description = __doc__)
12 argparser.add_argument("-o", "--out", action="store", default="/dev/stdout", help="output GFA file [/dev/stdout]")
13 argparser.add_argument("--version", action="version", version="gfapy-renumber 0.1.0")
14 argparser.add_argument("gfa", help="input GFA file")
15 args = argparser.parse_args()
16
17 g = Gfa.from_file(args.gfa)
18 names = g.segment_names
19 width = math.ceil(math.log10(len(names)))
20 names.sort(key = lambda u: g.segment(u).length, reverse = True)
21 for i, name in enumerate(names):
22 g.segment(name).name = str(1 + i).rjust(width, "0")
23 g.to_file(args.out)
7373 # built documents.
7474 #
7575 # The short X.Y version.
76 version = '1.0.0'
76 version = '1.1'
7777 # The full version, including alpha/beta/rc tags.
78 release = '1.0.0'
78 release = '1.1.0'
7979
8080 # The language for content autogenerated by Sphinx. Refer to documentation
8181 # for a list of supported languages.
2525 tutorial/comments
2626 tutorial/errors
2727 tutorial/graph_operations
28 tutorial/rgfa
2829
2930 Indices and tables
3031 ==================
105105 :class:`~gfapy.alignment.cigar.CIGAR`, whose elements are CIGAR operations
106106 CIGAR operations are represented by instance of the class
107107 :class:`~gfapy.alignment.cigar.CIGAR.Operation`,
108 and provide the properties ``length`` (lenght of the operation, an integer)
108 and provide the properties ``length`` (length of the operation, an integer)
109109 and ``code`` (one-letter string which specifies the type of operation).
110110 Note that not all operations allowed in SAM files (for which CIGAR strings
111111 were first defined) are also meaningful in GFA and thus GFA2 only allows
2121 itself is read-only. To remove a comment from the Gfa, you need to find the
2222 instance in the list, and call
2323 :func:`~gfapy.line.common.disconnection.Disconnection.disconnect` on it. To
24 add a comment to a :class:`~gfapy.gfa.Gfa` instance is done similary to other
24 add a comment to a :class:`~gfapy.gfa.Gfa` instance is done similarly to other
2525 lines, by using the :func:`Gfa.add_line(line)
2626 <gfapy.lines.creators.Creators.add_line>` method.
2727
7070 structural elements of the line). No further validation is performed.
7171
7272 As Gfapy cannot know how many positional fields are present when parsing custom
73 records, an heuristic approach is followed, to identify tags. A field resembles
73 records, a heuristic approach is followed, to identify tags. A field resembles
7474 a tag if it starts with ``tn:d:`` where ``tn`` is a valid tag name and ``d`` a
7575 valid tag datatype (see :ref:`tags` chapter). The fields are parsed from the
7676 last to the first.
139139 :func:`~gfapy.line.line.Line.register_extension`.
140140
141141 The constants to define are ``RECORD TYPE``, which shall be the content
142 of the record type field (e.g. ``M``); ``POSFIELDS`` shall contain a ordered
142 of the record type field (e.g. ``M``); ``POSFIELDS`` shall contain an ordered
143143 dict, specifying the datatype for each positional field, in the order these
144144 fields are found in the line; ``TAGS_DATATYPE`` is a dict, specifying the
145145 datatype of the predefined optional tags; ``NAME_FIELD`` is a field name,
5858
5959 All methods for creating a Gfa (constructor and from_file) accept
6060 a ``vlevel`` parameter, the validation level,
61 and can assume the values 0, 1, 2 and 3. An higher value means
61 and can assume the values 0, 1, 2 and 3. A higher value means
6262 more validations are performed. The :ref:`validation` chapter explains
6363 the meaning of the different validation levels in detail.
6464 The default value is 1.
105105
106106 The property :attr:`~gfapy.lines.collections.Collections.lines`
107107 of the Gfa object is a list of all the lines
108 in the GFA file (including the header, which is splitted into single-tag
108 in the GFA file (including the header, which is split into single-tag
109109 lines). The list itself shall not be modified by the user directly (i.e.
110110 adding and removing lines is done using a different interface, see
111111 below). However the single elements of the list can be edited.
120120 .. doctest::
121121
122122 >>> [str(line) for line in gfa1.segments]
123 ['S\t1\t*', 'S\t3\t*', 'S\t2\t*']
123 ['S\t1\t*', 'S\t2\t*', 'S\t3\t*']
124124 >>> [str(line) for line in gfa2.fragments]
125125 []
126126
127127 A particular case are edges; these are in GFA1 links and containments, while in
128 GFA2 there is an unified edge record type, which also allows to represent
128 GFA2 there is a unified edge record type, which also allows to represent
129129 internal alignments. In Gfapy, the
130130 :attr:`~gfapy.lines.collections.Collections.edges` property retrieves all edges
131131 (i.e. all E lines in GFA2, and all L and C lines in GFA1). The
187187 .. doctest::
188188
189189 >>> [str(line) for line in gfa2.custom_records]
190 ['Y\tcustom line', 'X\tcustom line']
190 ['X\tcustom line', 'Y\tcustom line']
191191 >>> gfa2.custom_record_keys
192 ['Y', 'X']
192 ['X', 'Y']
193193 >>> [str(line) for line in gfa2.custom_records_of_type('X')]
194194 ['X\tcustom line']
195195
229229 >>> g.add_line("U\ts1\tA b_c g")
230230 >>> g.add_line("G\tg\tA+\tB-\t1000\t*")
231231 >>> g.names
232 ['B', 'C', 'A', 'b_c', 'g', 'p1', 's1']
233 >>> g.segment_names
234 ['B', 'C', 'A']
232 ['A', 'B', 'C', 'b_c', 'g', 'p1', 's1']
233 >>> g.segment_names
234 ['A', 'B', 'C']
235235 >>> g.path_names
236236 ['p1']
237237 >>> g.edge_names
242242 ['s1']
243243
244244 The GFA1 specification does not handle the question of the namespace of
245 identifiers explicitely. However, gfapy assumes and enforces
245 identifiers explicitly. However, gfapy assumes and enforces
246246 a single namespace for segment, path names and the values of the ID tags
247247 of L and C lines. The content of this namespace can be found using
248248 :attr:`~gfapy.lines.collections.Collections.names` property.
250250 can be retrieved using the properties
251251 :attr:`~gfapy.lines.collections.Collections.segment_names`,
252252 :attr:`~gfapy.lines.collections.Collections.edge_names`
253 (ID tags of of links and containments) and
253 (ID tags of links and containments) and
254254 :attr:`~gfapy.lines.collections.Collections.path_names`.
255255 For GFA1, the properties
256256 :attr:`~gfapy.lines.collections.Collections.gap_names`,
266266 >>> g.add_line("L\tB\t+\tC\t+\t*\tID:Z:b_c")
267267 >>> g.add_line("P\tp1\tB+,C+\t*")
268268 >>> g.names
269 ['B', 'C', 'A', 'b_c', 'p1']
270 >>> g.segment_names
271 ['B', 'C', 'A']
269 ['A', 'B', 'C', 'b_c', 'p1']
270 >>> g.segment_names
271 ['A', 'B', 'C']
272272 >>> g.path_names
273273 ['p1']
274274 >>> g.edge_names
328328 ['A']
329329 >>> g.append("S\tB\t*") #doctest: +ELLIPSIS
330330 >>> g.segment_names
331 ['B', 'A']
331 ['A', 'B']
332332
333333 Editing the lines
334334 ~~~~~~~~~~~~~~~~~
398398 >>> g.add_line("S\tA\t*") #doctest: +ELLIPSIS
399399 >>> g.add_line("L\tA\t+\tB\t-\t*") #doctest: +ELLIPSIS
400400 >>> g.segment_names
401 ['B', 'A']
401 ['A', 'B']
402402 >>> g.dovetails[0].from_name
403403 'A'
404404 >>> g.segment('A').name = 'C'
138138 String representation of the header
139139 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
140140
141 For consinstency with other line types, the string representation of the header
141 For consistency with other line types, the string representation of the header
142142 is a single-line string, eventually non standard-compliant, if it contains
143143 multiple instances of the tag. (and when calling
144144 :meth:`~gfapy.line.common.writer.Writer.field_to_s` for a tag present multiple
145145 times, the output string will contain the instances of the tag, separated by
146146 tabs).
147147
148 However, when the Gfa is output to file or string, the header is splitted into
148 However, when the Gfa is output to file or string, the header is split into
149149 multiple H lines with single tags, so that standard-compliant GFA is output.
150 The splitted header can be retrieved using the
150 The split header can be retrieved using the
151151 :attr:`~gfapy.lines.headers.Headers.headers` property of the Gfa instance.
152152
153153 .. doctest::
2020 ~~~~~~~~~~~
2121
2222 The field names are derived from the specification. Lower case versions
23 of the field names are used and spaces are subsituted with underscores.
23 of the field names are used and spaces are substituted with underscores.
2424 In some cases, the field names were changed, as they represent keywords
2525 in common programming languages (``from``, ``send``).
2626
441441 Crypical field names
442442 ^^^^^^^^^^^^^^^^^^^^
443443
444 The definition of from and to for containments is somewhat cryptical.
444 The definition of from and to for containments is somewhat cryptic.
445445 Therefore following aliases have been defined for containments:
446446 container[\_orient] for from[\_\|segment\|orient]; contained[\_orient]
447447 for to[\_segment\|orient].
2626 automatically when a GFA file is parsed). All strings expressing
2727 references are then changed into references to the corresponding line
2828 objects. The method ``is_connected()`` allows to determine if a line is
29 connected to an gfapy instance. The read-only property ``gfa`` contains
29 connected to a gfapy instance. The read-only property ``gfa`` contains
3030 the ``gfapy.Gfa`` instance to which the line is connected.
3131
3232 .. doctest::
247247 The item list in GFA2 sets and paths may not contain elements which are
248248 implicitly involved. For example a path may contain segments, without
249249 specifying the edges connecting them, if there is only one such edge.
250 Alternatively a path may contain edges, without explitely indicating the
250 Alternatively a path may contain edges, without explicitly indicating the
251251 segments. Similarly a set may contain edges, but not the segments
252 refered to in them, or contain segments which are connected by edges,
252 referred to in them, or contain segments which are connected by edges,
253253 without the edges themselves. Furthermore groups may refer to other
254254 groups (set to sets or paths, paths to paths only), which then
255255 indirectly contain references to segments and edges.
261261 applied to connected lines. For unordered groups, this computation is
262262 provided by the method ``induced_set()``, which returns an array of
263263 segment and edge instances. For ordered group, the computation is
264 provided by the method ``captured_path()``, whcih returns a list of
264 provided by the method ``captured_path()``, which returns a list of
265265 ``gfapy.OrientedLine`` instances, alternating segment and edge instances
266266 (and starting and ending in segments).
267267
299299 >>> gfa.append('S\tsB\t*')
300300 >>> line = gfa.segment("sA")
301301 >>> gfa.segment_names
302 ['sB', 'sA']
302 ['sA', 'sB']
303303 >>> gfa.rm(line)
304304 >>> gfa.segment_names
305305 ['sB']
0 .. testsetup:: *
1
2 import gfapy
3
4 .. _rgfa:
5
6 rGFA
7 ----
8
9 rGFA (https://github.com/lh3/gfatools/blob/master/doc/rGFA.md)
10 is a subset of GFA1, in which only particular line types (S and L)
11 are allowed, and the S lines are required to contain the tags
12 `SN` (of type `Z`), `SO` and `SR` (of type `i`).
13
14 When working with rGFA files, it is convenient to use the `dialect="rgfa"`
15 option in the constructor `Gfa()` and in
16 func:`Gfa.from_file() <gfapy.gfa.Gfa.from_file>`.
17
18 This ensures that additional validations are performed: GFA version must be 1,
19 only rGFA-compatible lines (S,L) are allowed and that the required tags are
20 required (with the correct datatype). The validations can also be executed
21 manually using `Gfa.validate_rgfa() <gfapy.gfa.Gfa.validate_rgfa>`.
22
23 Furthermore, the `stable_sequence_names` attribute of the GFA objects
24 returns the set of stable sequence names contained in the `SN` tags
25 of the segments.
26
27 .. doctest::
28
29 >>> g = gfapy.Gfa("S\tS1\tCTGAA\tSN:Z:chr1\tSO:i:0\tSR:i:0", dialect="rgfa")
30 >>> g.segment_names
31 ['S1']
32 >>> g.stable_sequence_names
33 ['chr1']
34 >>> g.add_line("S\tS2\tACG\tSN:Z:chr1\tSO:i:5\tSR:i:0")
35
99
1010 Each record in GFA can contain tags. Tags are fields which consist in a
1111 tag name, a datatype and data. The format is ``NN:T:DATA`` where ``NN``
12 is a two-letter tag name, ``T`` is an one-letter datatype string and
12 is a two-letter tag name, ``T`` is a one-letter datatype string and
1313 ``DATA`` is a string representing the data according to the specified
1414 datatype. Tag names must be unique for each line, i.e. each line may
1515 only contain a tag once.
2828 Custom tags
2929 ~~~~~~~~~~~
3030
31 Some tags are explicitely defined in the specification (these are named
31 Some tags are explicitly defined in the specification (these are named
3232 *predefined tags* in Gfapy), and the user or an application can define
33 its own custom tags.
33 its own custom tags. These may contain lower case letters.
3434
3535 Custom tags are user or program specific and may of course collide with
3636 the tags used by other users or programs. For this reasons, if you write
6262 myvalue = line.get(mytag)
6363 # ...
6464
65 Tag names in GFA1
66 ~~~~~~~~~~~~~~~~~
67
68 According to the GFA1 specification, custom tags are lower case, while
69 predefined tags are upper case (in both cases the second character in
70 the name can be a number). There is a number of predefined tags in the
71 specification, different for each kind of line.
72
73 ::
74
75 "VN:Z:1.0" # VN is upcase => predefined tag
65 Predefined tags
66 ~~~~~~~~~~~~~~~
67
68 According to the GFA specifications, predefined tag names consist of either
69 two upper case letters, or an upper case letter followed by a digit.
70 The GFA1 specification predefines tags for each line type, while GFA2
71 only predefines tags for the header and edges.
72
73 While tags with the predefined names are allowed to be added to any line,
74 when they are used in the lines mentiones in the specification (e.g. `VN`
75 in the header) gfapy checks that the datatype is the one prescribed by
76 the specification (e.g. `VN` must be of type `Z`). It is not forbidden
77 to use the same tags in other contexts, but in this case, the datatype
78 restriction is not enforced.
79
80 +------------+------------+-----------------------+
81 | Tag | Type | Line types | GFA version |
82 +============+============+=======================+
83 | VN | Z | H | 1,2 |
84 +-----+------+------------+-----------------------+
85 | TS | i | H,S | 2 |
86 +-----+------+------------+-----------------------+
87 | LN | i | S | 1 |
88 +-----+------+------------+-----------------------+
89 | RC | i | S,L,C | 1 |
90 +-----+------+------------+-----------------------+
91 | FC | i | S,L | 1 |
92 +-----+------+------------+-----------------------+
93 | KC | i | S,L | 1 |
94 +-----+------+------------+-----------------------+
95 | SH | H | S | 1 |
96 +-----+------+------------+-----------------------+
97 | UR | Z | S | 1 |
98 +-----+------+------------+-----------------------+
99 | MQ | i | L | 1 |
100 +-----+------+------------+-----------------------+
101 | NM | i | L,i | 1 |
102 +-----+------+------------+-----------------------+
103 | ID | Z | L,C | 1 |
104 +-----+------+------------+-----------------------+
105
106 ::
107
108 "VN:Z:1.0" # VN => predefined tag
76109 "z5:Z:1.0" # z5 first char is downcase => custom tag
77
78 # not forbidden, but not reccomended:
110 "XX:Z:aaa" # XX upper case, but not predefined => custom tag
111
112 # not forbidden, but not recommended:
79113 "zZ:Z:1.0" # => mixed case, first char downcase => custom tag
80114 "Zz:Z:1.0" # => mixed case, first char upcase => custom tag
81115 "vn:Z:1.0" # => same name as predefined tag, but downcase => custom tag
82116
83 Besides the tags described in the specification, in GFA1 headers, the TS
84 tag is allowed, in order to simplify the translation of GFA2 files.
85
86 Tag names in GFA2
87 ~~~~~~~~~~~~~~~~~
88
89 The GFA2 specification is currently not as strict regarding tags: anyone
90 can use both upper and lower case tags, and no tags are predefined
91 except for VN and TS.
92
93 However, Gfapy follows the same conventions as for GFA1: i.e. it allows
94 the tags specified as predefined tags in GFA1 to be used also in GFA2.
95 No other upper case tag is allowed in GFA2.
96
97117 Datatypes
98118 ~~~~~~~~~
99119
120140 Validation
121141 ~~~~~~~~~~
122142
123 The tag name is validated according the the rules described above:
124 except for the upper case tags indicated in the GFA1 specification, and
125 the TS header tag, all other tags must contain at least one lower case
126 letter.
127
128 ::
129
130 "VN:i:1" # => in header: allowed, elsewhere: error
131 "TS:i:1" # => allowed in headers and GFA2 Edges
132 "KC:i:1" # => allowed in links, containments, GFA1/GFA2 segments
133 "xx:i:1" # => custom tag, always allowed
143 The tag names must consist of a letter and a digit or two letters.
144
145 ::
146
147 "KC:i:1" # => OK
148 "xx:i:1" # => OK
149 "x1:i:1" # => OK
134150 "xxx:i:1" # => error: name is too long
135151 "x:i:1" # => error: name is too short
136152 "11:i:1" # => error: at least one letter must be present
1414 Manual validation
1515 ~~~~~~~~~~~~~~~~~
1616
17 Independently from the validation level choosen, the user can always check the
17 Independently from the validation level chosen, the user can always check the
1818 value of a field calling
1919 :meth:`~gfapy.line.common.validate.Validate.validate_field` on the line
20 instance. If no exeption is raised, the field content is valid.
20 instance. If no exception is raised, the field content is valid.
2121
2222 To check if the entire content of the line is valid, the user can call
2323 :meth:`~gfapy.line.common.validate.Validate.validate` on the line instance.
00 VERSIONS = ["gfa1", "gfa2"]
1 DIALECTS = ["rgfa", "standard"]
12 from gfapy.error import *
23 from gfapy.placeholder import Placeholder
34 from gfapy.placeholder import is_placeholder
3939 """
4040 mod = gfapy.Field.FIELD_MODULE.get(datatype)
4141 if mod is None:
42 linemsg = ""
4243 try:
43 linemsg = ("Line content: " + str(line) + "\n") if line is not None else ""
44 if line is not None and not line.__error__:
45 line.__error__ = True # avoids infinite recursion
46 linemsg = ["Line content:"]
47 linemsg.append(str(line))
48 linemsg.append("\n")
4449 except:
45 linemsg = ""
50 pass
4651 fieldnamemsg = "Field: {}\n".format(fieldname) if fieldname else ""
4752 contentmsg = "Content: {}\n".format(string)
4853 raise gfapy.TypeError(
5661 else:
5762 return mod.unsafe_decode(string)
5863 except Exception as err:
64 linemsg = ""
5965 try:
60 linemsg = ("Line content: " + str(line) + "\n") if line is not None else ""
66 if line is not None and not line.__error__:
67 line.__error__ = True # avoids infinite recursion
68 linemsg = ["Line content:"]
69 linemsg.append(str(line))
70 linemsg.append("\n")
6171 except:
62 linemsg = ""
72 pass
6373 fieldnamemsg = "Field: {}\n".format(fieldname) if fieldname else ""
6474 contentmsg = "Content: {}\n".format(string)
6575 datatypemsg = "Datatype: {}\n".format(datatype)
76 errmsg = err.message if hasattr(err, "message") else str(err)
6677 raise err.__class__(
6778 linemsg +
6879 fieldnamemsg +
6980 datatypemsg +
7081 contentmsg +
71 (err.message if hasattr(err, "message") else str(err))) from err
82 errmsg) from err
7283
7384 @staticmethod
7485 def _parse_gfa_tag(tag):
11 from .lines import Lines
22 from .graph_operations import GraphOperations
33 from collections import defaultdict
4 from .rgfa import RGFA
45 import sys
56
6 class Gfa(Lines,GraphOperations):
7 class Gfa(Lines,GraphOperations,RGFA):
78 """Representation of the data in a GFA file.
89
910 Parameters:
1314 vlevel (int): validation level (default: 1)
1415 version (str): GFA version ('gfa1' or 'gfa2';
1516 default: automatic recognition)
17 dialect (str): dialect ('standard' or 'rgfa';
18 default: standard)
1619
1720 Raises:
1821 ~gfapy.error.ArgumentError: if the vlevel or version are invalid
1922 ~gfapy.error.FormatError: if data is provided, which is invalid
2023 ~gfapy.error.VersionError: if an unknown version is specified, or data is
2124 provided, which is not compatible with the specified version
25 ~gfapy.error.VersionError: if an unknown dialect is specified
2226 """
2327
24 def __init__(self, *args, vlevel = 1, version = None):
28 def __init__(self, *args, vlevel = 1, version = None, dialect = "standard"):
2529 if not isinstance(vlevel, int):
2630 raise gfapy.ArgumentError("vlevel is not an integer ({})".format(vlevel))
2731 if vlevel < 0:
2933 "vlevel is not a positive integer ({})".format(vlevel))
3034 if not version in ['gfa1', 'gfa2', None]:
3135 raise gfapy.VersionError("GFA version unknown ({})".format(version))
36 if not dialect in ['standard', 'rgfa', None]:
37 raise gfapy.VersionError("GFA dialect unknown ({})".format(dialect))
3238 self._vlevel = vlevel
3339 self._max_int_name = 0
3440 self._records = defaultdict(dict)
5864 self._version_explanation = "set during initialization"
5965 self._version_guess = version
6066 self._validate_version()
67 self._dialect = dialect.lower()
6168 if len(args) == 1:
6269 lst = None
6370 if isinstance(args[0], str):
8390
8491 @version.setter
8592 def version(self,value):
86 self._vlevel=value
93 self._version=value
94
95 @property
96 def dialect(self):
97 """GFA dialect ('standard' or 'rgfa')"""
98 return self._dialect
99
100 @dialect.setter
101 def dialect(self, value):
102 self._dialect = value.lower()
87103
88104 @property
89105 def vlevel(self):
103119 self.__validate_path_links()
104120 self.__validate_group_items()
105121 self.__validate_gfa2_positions()
122 if self._dialect == "rgfa":
123 self.validate_rgfa()
106124
107125 def __str__(self):
108126 return "\n".join([str(line) for line in self.lines])
199217 return self
200218
201219 @classmethod
202 def from_file(cls, filename, vlevel = 1, version = None):
220 def from_file(cls, filename, vlevel = 1, version = None, dialect="standard"):
203221 """Create a Gfa instance from the contents of a GFA file.
204222
205223 Parameters:
211229 Returns:
212230 gfapy.Gfa
213231 """
214 gfa = cls(vlevel = vlevel, version = version)
232 gfa = cls(vlevel = vlevel, version = version, dialect = dialect)
215233 gfa.read_file(filename)
216234 return gfa
217235
4242 * different: different syntax in different versions
4343 """
4444
45 def __new__(cls, data, vlevel = 1, virtual = False, version = None):
45 def __new__(cls, data, vlevel = 1, virtual = False, dialect = "standard",
46 version = None):
4647 if isinstance(data, str):
4748 data = data.split("\t")
4849 if isinstance(data, list) and cls.RECORD_TYPE == None:
4950 cls = gfapy.Line._subclass(data, version = version)
5051 return object.__new__(cls)
5152
52 def __init__(self, data, vlevel = 1, virtual = False, version = None):
53 def __init__(self, data, vlevel = 1, virtual = False,
54 version = None, dialect = "standard"):
55 self._dialect = dialect.lower()
5356 self.vlevel = vlevel
5457 self._virtual = virtual
5558 self._datatype = {}
4545 ----------
4646 fieldname : str
4747 The name of the field to set.
48 (positional field, predefined tag (uppercase) or custom tag (lowercase))
48 (positional field, predefined or custom tag)
4949
5050 Raises
5151 ------
6969 if self._datatype.get(fieldname, None) is not None:
7070 return self._set_existing_field(fieldname, value)
7171 elif value is not None:
72 self._datatype[fieldname] = gfapy.Field._get_default_gfa_tag_datatype(value)
72 self._datatype[fieldname] = \
73 gfapy.Field._get_default_gfa_tag_datatype(value)
7374 self._data[fieldname] = value
7475 return self._data[fieldname]
7576 else:
4343 if self._is_predefined_tag(n):
4444 self._validate_predefined_tag_type(n, self._field_datatype(n))
4545 elif not self._is_valid_custom_tagname(n):
46 raise gfapy.FormatError("Custom tags must be lower case\n"+
47 "Found: {}".format(n))
46 raise gfapy.FormatError("Custom tag names must consist in a letter "+
47 "and a digit or two letters\nFound: {}".format(tagname))
4848
4949 def _validate_predefined_tag_type(self, tagname, datatype):
5050 if datatype != self.__class__.DATATYPE[tagname]:
5454
5555 def _validate_custom_tagname(self, tagname):
5656 if not self._is_valid_custom_tagname(tagname):
57 raise gfapy.FormatError("Custom tags must be lower case\n"+
58 "Found: {}".format(tagname))
57 raise gfapy.FormatError("Custom tag names must consist in a letter "+
58 "and a digit or two letters\nFound: {}".format(tagname))
5959
6060 @staticmethod
6161 def _is_valid_custom_tagname(tagname):
62 return (re.match(r"^[a-z][a-z0-9]$", tagname))
62 return (re.match(r"^[A-Za-z][A-Za-z0-9]$", tagname))
6363
6464 def _validate_record_type_specific_info(self):
6565 pass
1616 GFA specification version
1717 """
1818 return self._version
19
20 @property
21 def dialect(self):
22 """
23 Returns
24 -------
25 gfapy.DIALECTS, None
26 GFA specification version
27 """
28 return self._dialect
1929
2030 def to_version_s(self, version):
2131 """
3737 str list
3838 A list of string representations of the fields.
3939 """
40 a = [self.record_type]
40 a = []
4141 errors = []
42 try:
43 rt = self.record_type
44 except:
45 rt = "<error>"
46 errors.append("record_type")
47 a.append(rt)
4248 for fn in self.positional_fieldnames:
4349 try:
4450 fstr = self.field_to_s(fn, tag = False)
101107 try:
102108 s = str(self)
103109 except:
104 s = "\t".join([ self.record_type + "(error!)" ] + \
105 [ repr(self.get(fn)) for fn in self.positional_fieldnames ] + \
106 [ (fn + ":" + self.get_datatype(fn) + ":" + repr(self.get(fn))) for fn in self.tagnames ])
110 rt = self.record_type + "(error!)"
111 s = [ rt ]
112 for fn in self.positional_fieldnames:
113 try:
114 field_s = repr(self.get(fn))
115 except:
116 field_s = "<error>"
117 s.append(field_s)
118 for tn in self.tagnames:
119 dt = self.get_datatype(tn)
120 try:
121 tv = repr(self.get(tn))
122 except:
123 tv = "<ERROR>"
124 s.append("{}:{}:{}".format(tn,dt,tv))
125 s = "\t".join(s)
107126 return "gfapy.Line('{0}',version='{1}',vlevel={2})".format(s,self.version,self.vlevel)
108127
109128 def refstr(self, maxlen=10):
3434 version (str) : one of 'gfa1' and 'gfa2'; the GFA version; if not specified,
3535 then the version is guessed from the record type and syntax, or set
3636 to 'generic'
37 dialect (str) : one of 'rgfa' and 'standard'; the GFA dialect; if not
38 specified then the dialect is set to 'standard'; 'rgfa' is only valid
39 when version is 'gfa1'
3740
3841 Notes:
3942 The private interface to the Line constructor also allows to pass a
4346
4447 Raises:
4548 gfapy.error.FormatError: If the line contains a wrong number of positional
46 fields, if non-predefined tags use upcase letters, or if the content of a
47 field has a wrong format.
49 fields, or if the content of a field has a wrong format.
4850 gfapy.error.NotUniqueError: If a tag name is used more than once.
4951 gfapy.error.TypeError: If the value of a predefined tag does not
5052 respect the datatype specified in the tag.
8383 "Only strings and gfapy.Line instances can be added")
8484 if rt == "#":
8585 if isinstance(gfa_line, str):
86 gfa_line = gfapy.Line(gfa_line)
86 gfa_line = gfapy.Line(gfa_line, dialect=self._dialect)
8787 gfa_line.connect(self)
8888 elif rt == "H":
8989 if isinstance(gfa_line, str):
90 gfa_line = gfapy.Line(gfa_line, vlevel=self._vlevel)
90 gfa_line = gfapy.Line(gfa_line, vlevel=self._vlevel,
91 dialect=self._dialect)
9192 self.header._merge(gfa_line)
9293 if gfa_line.VN:
9394 if gfa_line.VN == "1.0":
102103 self.process_line_queue()
103104 elif rt == "S":
104105 if isinstance(gfa_line, str):
105 gfa_line = gfapy.Line(gfa_line, vlevel=self._vlevel)
106 gfa_line = gfapy.Line(gfa_line, vlevel=self._vlevel,
107 dialect=self._dialect)
106108 self._version = gfa_line.version
107109 self._version_explanation = \
108110 "implied by: syntax of S {} line".format(gfa_line.name)
113115 self._version_explanation = "implied by: presence of a {} line".format(rt)
114116 if isinstance(gfa_line, str):
115117 gfa_line = gfapy.Line(gfa_line, vlevel=self._vlevel,
116 version=self._version)
118 version=self._version, dialect=self._dialect)
117119 self.process_line_queue()
118120 gfa_line.connect(self)
119121 elif rt in ["L", "C", "P"]:
125127 def __add_line_GFA1(self, gfa_line):
126128 if isinstance(gfa_line, str):
127129 if gfa_line[0] == "S":
128 gfa_line = gfapy.Line(gfa_line, vlevel=self._vlevel)
130 gfa_line = gfapy.Line(gfa_line, vlevel=self._vlevel,
131 dialect=self._dialect)
129132 else:
130133 gfa_line = gfapy.Line(gfa_line, vlevel=self._vlevel,
131 version="gfa1")
134 dialect=self._dialect, version="gfa1")
132135 elif gfa_line.__class__ in gfapy.Lines.GFA2Specific:
133136 raise gfapy.VersionError(
134137 "Version: 1.0 ({})\n".format(self._version_explanation)+
156159 def __add_line_GFA2(self, gfa_line):
157160 if isinstance(gfa_line, str):
158161 if gfa_line[0] == "S":
159 gfa_line = gfapy.Line(gfa_line, vlevel=self._vlevel)
162 gfa_line = gfapy.Line(gfa_line, vlevel=self._vlevel,
163 dialect=self._dialect)
160164 else:
161165 gfa_line = gfapy.Line(gfa_line, vlevel=self._vlevel,
162 version="gfa2")
166 version="gfa2", dialect=self._dialect)
163167 elif gfa_line.__class__ in gfapy.Lines.GFA1Specific:
164168 raise gfapy.VersionError(
165169 "Version: 2.0 ({})\n".format(self._version_explanation)+
0 import gfapy
1
2 class RGFA():
3 """
4 Add support of rGFA format.
5 A dialect-specific validation method is added, as well as convenience
6 methods to handle the stable sequence names.
7 """
8
9 def is_rgfa(self):
10 """
11 Indicate that rGFA dialect of GFA1 shall be used
12 """
13 return self._dialect == "rgfa"
14
15 @property
16 def stable_sequence_names(self):
17 """Stable sequence names from rGFA SN tags"""
18 if self._dialect != "rgfa":
19 return []
20 stable_seqs = set()
21 for s in self.segments:
22 stable_seqs.add(s.SN)
23 return list(stable_seqs)
24
25 def validate_rgfa(self):
26 """
27 Validate rGFA
28
29 - version must be 1.0
30 - no H, P, C lines are present
31 - S lines have rGFA-specific predefined tags
32 - if L lines have rGFA-specific tags, they have the correct type
33 - overlaps must be 0M
34 """
35 self._validate_rgfa_version()
36 self._validate_rgfa_no_headers()
37 self._validate_rgfa_no_containments()
38 self._validate_rgfa_no_paths()
39 self._validate_rgfa_tags_in_lines(self.segments)
40 self._validate_rgfa_tags_in_lines(self.dovetails)
41 self._validate_rgfa_link_overlaps()
42
43 def _validate_rgfa_version(self):
44 """Validate version of rGFA (it must be gfa1)"""
45 if self.version != "gfa1":
46 raise gfapy.VersionError("rGFA format only supports GFA version 1")
47
48 def _validate_rgfa_no_headers(self):
49 """Validate the absence of H lines in rGFA"""
50 if self.headers:
51 raise gfapy.ValueError("rGFA does not support header lines")
52
53 def _validate_rgfa_no_containments(self):
54 """Validate the absence of C lines in rGFA"""
55 if self.containments:
56 raise gfapy.ValueError("rGFA does not support containment lines")
57
58 def _validate_rgfa_no_paths(self):
59 """Validate the absence of P lines in rGFA"""
60 if self.paths:
61 raise gfapy.ValueError("rGFA does not support path lines")
62
63 RGFA_TAGS = {
64 "mandatory": {
65 "S": {"SN": "Z", "SO": "i", "SR": "i"},
66 "L": {},
67 },
68 "optional": {
69 "S": {},
70 "L": {"SR": "i", "L1": "i", "L2": "i"},
71 },
72 }
73
74 def _validate_rgfa_tags_in_lines(self, lines):
75 """
76 Validate rGFA tags for a group of lines
77 """
78 for line in lines:
79 rt = line.record_type
80 tags_check_presence = gfapy.Gfa.RGFA_TAGS["mandatory"].get(rt, {})
81 tags_check_datatype = tags_check_presence.copy()
82 tags_check_datatype.update(gfapy.Gfa.RGFA_TAGS["optional"].get(rt,{}))
83 for tag, datatype in tags_check_presence.items():
84 if tag not in line.tagnames:
85 raise gfapy.NotFoundError(
86 "rGFA {} lines must have a {} tag\n".format(rt, tag)+
87 "offending line:\n{}".format(str(line)))
88 for tag, datatype in tags_check_datatype.items():
89 if tag in line.tagnames:
90 if line.get_datatype(tag) != datatype:
91 raise gfapy.ValueError(
92 "rGFA {} tags in {} lines must have datatype {}\n".format(
93 tag, rt, datatype)+
94 "offending line:\n{}".format(str(line)))
95
96 def _validate_rgfa_link_overlaps(self):
97 for link in self.dovetails:
98 if link.field_to_s("overlap") != "0M":
99 raise gfapy.ValueError("rGFA CIGARs must be 0M\n",
100 "offending line:\n{}".format(str(link)))
88 sys.exit("Sorry, only Python 3 is supported")
99
1010 setup(name='gfapy',
11 version='1.0.0',
11 version='1.1.0',
1212 description='Library for handling data in the GFA1 and GFA2 formats',
1313 long_description=readme(),
1414 url='https://github.com/ggonnella/gfapy',
3131 'Topic :: Software Development :: Libraries',
3232 ],
3333 packages=find_packages(),
34 scripts=['bin/gfapy-convert','bin/gfapy-validate',
35 'bin/gfapy-mergelinear'],
34 scripts=['bin/gfapy-convert',
35 'bin/gfapy-mergelinear',
36 'bin/gfapy-renumber',
37 'bin/gfapy-validate'],
3638 zip_safe=False,
3739 test_suite="nose.collector",
3840 include_package_data=True,
0 import gfapy
1 import unittest
2
3 class TestAPIrGfa(unittest.TestCase):
4
5 def test_adding_invalid_segment_to_rgfa(self):
6 gfa = gfapy.Gfa()
7 gfa.append("S\t1\t*")
8 gfa.validate()
9 gfa = gfapy.Gfa(dialect="rgfa")
10 gfa.append("S\t1\t*")
11 with self.assertRaises(gfapy.NotFoundError): gfa.validate()
12
13 def test_adding_containment_to_rgfa(self):
14 gfa = gfapy.Gfa()
15 gfa.append("C\t1\t+\t2\t+\t12\t*")
16 gfa.validate()
17 gfa = gfapy.Gfa(version="gfa1",dialect="rgfa")
18 gfa.append("C\t1\t+\t2\t+\t12\t*")
19 with self.assertRaises(gfapy.NotFoundError): gfa.validate()
20
21 def test_loading_examples(self):
22 gfapy.Gfa.from_file("tests/testdata/rgfa_example.1.gfa", dialect="rgfa")
23 gfapy.Gfa.from_file("tests/testdata/rgfa_example.2.gfa", dialect="rgfa")
24
25 def test_stable_sequence_names(self):
26 g = gfapy.Gfa.from_file("tests/testdata/rgfa_example.2.gfa", dialect="rgfa")
27 self.assertEqual(['smpl-Ref.Bd4', 'smpl-Bd21_3_r.pseudomolecule_4'],
28 g.stable_sequence_names)
29 g = gfapy.Gfa.from_file("tests/testdata/rgfa_example.1.gfa", dialect="rgfa")
30 self.assertEqual(['bar', 'foo', 'chr1'],
31 g.stable_sequence_names)
1515
1616 def test_custom_tags(self):
1717 for version in ["gfa1","gfa2"]:
18 # upper case
19 gfapy.line.Header(["H", "ZZ:Z:1"], version=version, vlevel=0) # nothing raised
20 gfapy.line.Header("H\tZZ:Z:1", version=version, vlevel=0) # nothing raised
21 gfapy.line.Header("H\tZZ:Z:1", version=version, vlevel=0) # nothing raised
22 gfapy.Gfa("H\tZZ:Z:1", version=version, vlevel=0) # nothing raised
23 for level in [1,2,3]:
24 self.assertRaises(gfapy.FormatError,
25 gfapy.line.Header,["H", "ZZ:Z:1"], version=version, vlevel=level)
26 self.assertRaises(gfapy.FormatError,
27 gfapy.Line, "H\tZZ:Z:1", version=version, vlevel=level)
28 self.assertRaises(gfapy.FormatError,
29 gfapy.Gfa, "H\tZZ:Z:1", version=version, vlevel=level)
30 # lower case
31 for level in [0,1,2,3]:
32 gfapy.line.Header(["H", "zz:Z:1"], version=version, vlevel=0) # nothing raised
33 gfapy.Line("H\tzz:Z:1", version=version, vlevel=0) # nothing raised
34 gfapy.Gfa("H\tzz:Z:1", version=version, vlevel=0) # nothing raised
18 for tagname in ["ZZ","Z1","Zz","zz"]:
19 for level in [0,1,2,3]:
20 tag = "{}:Z:1".format(tagname)
21 gfapy.line.Header(["H", tag], version=version, vlevel=level) # nothing raised
22 gfapy.Line("H\t"+tag, version=version, vlevel=level) # nothing raised
23 gfapy.Gfa("H\t"+tag, version=version, vlevel=level) # nothing raised
3524
3625 def test_wrong_tag_format(self):
3726 self.assertRaises(gfapy.FormatError, gfapy.line.Header, ["H", "VN i:1"])
8271 l = gfapy.line.Header(["H", "zz:i:1", "VN:Z:1.0"], version="gfa1", vlevel=0)
8372 l.zz = "x"
8473 self.assertRaises(gfapy.FormatError, l.validate)
85 # wrong predefined tag name
86 l = gfapy.line.Header(["H", "zz:i:1", "VZ:Z:1.0"], version="gfa1", vlevel=0)
87 self.assertRaises(gfapy.FormatError, l.validate)
8874 # wrong predefined tag datatype
8975 l = gfapy.line.Header(["H", "zz:i:1", "VN:i:1"], version="gfa1", vlevel=0)
9076 self.assertRaises(gfapy.TypeError, l.validate)
9278 # test tags for get/set tests:
9379 # - KC -> predefined, set
9480 # - RC -> predefined, not set;
95 # - XX -> custom, invalid (upper case)
9681 # - xx -> custom set
9782 # - zz -> custom not set
83 # - XX -> custom, not set, upper case
9884
9985 def test_get_tag_content(self):
10086 for version in ["gfa1","gfa2"]:
10591 # test presence of tag
10692 assert(l.KC)
10793 assert(not l.RC)
108 with self.assertRaises(AttributeError): l.XX
94 assert(not l.XX)
10995 assert(l.xx)
11096 assert(not l.zz)
111 # get tag content, fieldname methods
97 # tagname as attribute
11298 self.assertEqual(10, l.KC)
11399 self.assertEqual(None, l.RC)
114 with self.assertRaises(AttributeError): l.XX
100 self.assertEqual(None, l.XX)
115101 self.assertEqual(1.3, l.xx)
116102 self.assertEqual(None, l.zz)
117 # get tag content, get()
103 # get(tagname)
118104 self.assertEqual(10, l.get("KC"))
119105 self.assertEqual(None, l.get("RC"))
120106 self.assertEqual(None, l.get("XX"))
121107 self.assertEqual(1.3, l.get("xx"))
122108 self.assertEqual(None, l.get("zz"))
123 # banged version, fieldname methods
109 # try_get_<tagname>()
124110 self.assertEqual(10, l.try_get_KC())
125111 self.assertRaises(gfapy.NotFoundError, l.try_get_RC)
126 with self.assertRaises(AttributeError): l.try_get_XX()
112 with self.assertRaises(gfapy.NotFoundError):
113 l.try_get_XX()
127114 self.assertEqual(1.3, l.try_get_xx())
128115 with self.assertRaises(gfapy.NotFoundError):
129116 l.try_get_zz()
130 # banged version, get()
117 # try_get(tagname)
131118 self.assertEqual(10, l.try_get("KC"))
132119 self.assertRaises(gfapy.NotFoundError, l.try_get, "RC")
133120 self.assertRaises(gfapy.NotFoundError, l.try_get, "XX")
134121 self.assertEqual(1.3, l.try_get("xx"))
135122 self.assertRaises(gfapy.NotFoundError, l.try_get, "zz")
136 # get tag datatype
123 # get_datatype(tagname)
137124 self.assertEqual("i", l.get_datatype("KC"))
138125 self.assertEqual("i", l.get_datatype("RC"))
139126 self.assertEqual(None, l.get_datatype("XX"))
140127 self.assertEqual("f", l.get_datatype("xx"))
141128 self.assertEqual(None, l.get_datatype("zz"))
142 # as string: content only
129 # field_to_s(tagname, tag=False)
143130 self.assertEqual("10", l.field_to_s("KC"))
144131 self.assertRaises(gfapy.NotFoundError, l.field_to_s, "RC")
145132 self.assertRaises(gfapy.NotFoundError, l.field_to_s, "XX")
146133 self.assertEqual("1.3", l.field_to_s("xx"))
147134 self.assertRaises(gfapy.NotFoundError, l.field_to_s, "zz")
148 # as string: complete
135 # field_to_s(tagname, tag=True)
149136 self.assertEqual("KC:i:10", l.field_to_s("KC", tag=True))
150137 self.assertEqual("xx:f:1.3", l.field_to_s("xx", tag=True))
151 ## # respond_to? normal version
152 ## assert(l.respond_to?("KC"))
153 ## assert(l.respond_to?("RC"))
154 ## assert(not l.respond_to?("XX"))
155 ## assert(l.respond_to?("xx"))
156 ## assert(l.respond_to?("zz"))
157 ## # respond_to? banged version
158 ## assert(l.respond_to?("KC"!))
159 ## assert(l.respond_to?("RC"!))
160 ## assert(not l.respond_to?("XX"!))
161 ## assert(l.respond_to?("xx"!))
162 ## assert(l.respond_to?("zz"!))
163138
164139 def test_set_tag_content(self):
165140 for version in ["gfa1","gfa2"]:
175150 l.set("RC", 14) # nothing raised; self.assertEqual(14, l.RC)
176151 l.set("xx", 1.4) # nothing raised; self.assertEqual(1.4, l.xx)
177152 l.set("zz", 1.4) # nothing raised; self.assertEqual(1.4, l.zz)
178 # check respond_to method
179 ### assert(l.has_attr("KC"))
180 ### assert(l.has_attr("RC"))
181 ### assert(not l.respond_to?("XX"=))
182 ### assert(l.respond_to?("xx"=))
183 ### assert(l.respond_to?("zz"=))
184153 # set datatype for predefined field
185 self.assertRaises(gfapy.RuntimeError, l.set_datatype, "KC", "Z")
154 self.assertRaises(gfapy.RuntimeError, l.set_datatype, "KC","Z")
186155 self.assertRaises(gfapy.RuntimeError, l.set_datatype, "RC","Z")
187156 # set datatype for non-existing custom tag
188157 l.set_datatype("zz", "i") # nothing raised
189 if level == 0:
190 l.set_datatype("XX", "Z") # nothing raised
191 elif level >= 1:
192 self.assertRaises(gfapy.FormatError, l.set_datatype, "XX", "Z")
158 l.set_datatype("XX", "Z") # nothing raised
193159 # change datatype for existing custom tag
194160 l.xx = 1.1 # nothing raised
195161 l.xx = "1.1" # nothing raised
196162 if level == 2:
197163 l.xx = "1A" # nothing raised
198 with self.assertRaises(gfapy.Error):
164 with self.assertRaises(gfapy.FormatError):
199165 str(l)
200166 elif level == 3:
201167 with self.assertRaises(gfapy.FormatError):
223189 self.assertEqual(None, l.KC)
224190 self.assertEqual(["xx"], l.tagnames)
225191 l.set("RC",None) # nothing raised
226 if level == 0:
227 l.set("XX",None) # nothing raised
228 else:
229 self.assertRaises(gfapy.FormatError,l.set,"XX",None)
192 l.set("XX",None) # nothing raised
230193 l.set("xx",None) # nothing raised
231194 self.assertEqual([], l.tagnames)
232195 l.set("zz",None) # nothing raised
3535 gfapy.line.Header(["H", "zz:i:1", "VN:Z:1", "zz:i:2"])
3636
3737 def test_initialize_custom_tag(self):
38 with self.assertRaises(gfapy.FormatError):
39 gfapy.line.Header(["H", "ZZ:Z:1"])
38 gfapy.line.Header(["H", "ZZ:Z:1"]) # nothing raised
4039
4140 def test_record_type(self):
4241 l = gfapy.line.Header(["H", "xx:i:13", "VN:Z:HI"])
0 S s1 CTGAA SN:Z:chr1 SO:i:0 SR:i:0
1 S s2 ACG SN:Z:chr1 SO:i:5 SR:i:0
2 S s3 TGGC SN:Z:chr1 SO:i:8 SR:i:0
3 S s4 TGTGA SN:Z:chr1 SO:i:12 SR:i:0
4 S s5 TTTC SN:Z:foo SO:i:8 SR:i:1
5 S s6 CTGA SN:Z:foo SO:i:12 SR:i:1
6 S s7 GTTAC SN:Z:bar SO:i:5 SR:i:2
7 L s1 + s2 + 0M
8 L s2 + s3 + 0M
9 L s3 + s4 + 0M
10 L s2 + s5 + 0M
11 L s5 + s6 + 0M
12 L s6 + s4 + 0M
13 L s1 + s7 - 0M
14 L s7 - s6 + 0M
0 S s8163 * LN:i:71969 SN:Z:smpl-Ref.Bd4 SO:i:26136048 SR:i:0
1 S s8164 * LN:i:139743 SN:Z:smpl-Ref.Bd4 SO:i:26208017 SR:i:0
2 S s8165 * LN:i:3788 SN:Z:smpl-Ref.Bd4 SO:i:26347760 SR:i:0
3 S s8166 * LN:i:50439 SN:Z:smpl-Ref.Bd4 SO:i:26351548 SR:i:0
4 S s14767 * LN:i:171 SN:Z:smpl-Bd21_3_r.pseudomolecule_4 SO:i:20514505 SR:i:2
5 L s8163 + s8164 + 0M SR:i:0 L1:i:71969 L2:i:139743
6 L s8163 + s14767 + 0M SR:i:2 L1:i:71969 L2:i:171
7 L s8164 + s8166 + 0M SR:i:2 L1:i:139743 L2:i:50439
8 L s8164 + s8165 + 0M SR:i:0 L1:i:139743 L2:i:3788
9 L s8165 + s8166 + 0M SR:i:0 L1:i:3788 L2:i:50439
10 L s14767 + s8164 + 0M SR:i:2 L1:i:171 L2:i:139743