Commit 738c59bae8dbe2c5620d86cc802d1483b7f9e357 - gfapy

+0

-13

~~.gitignore~~ less more

0		# Compiled python modules
1		*.pyc
2
3		# Setuptools distribution folder
4		/dist/
5
6		# Python egg metadata, regenerated from source files by setuptools
7		/*.egg-info
8		/*.egg
9
10		# Wheel data
11		build
12		conda

+0

-10

~~.travis.yml~~ less more

0		language: python
1		python:
2		- "3.4"
3		env:
4		- PYTHONHASHSEED=0
5		install:
6		- pip install .
7		- pip install nose
8		- pip install Sphinx
9		script: "make tests"

+0

-8

~~CHANGES.txt~~ less more

0		== 1.1.0 ==
1
2		- fix: custom tags are not necessarily lower case
3		- additional support for rGFA subset of GFA1 by setting option dialect="rgfa"
4
5		== 1.0.0 ==
6
7		- initial release

+0

-6

~~CONTRIBUTORS~~ less more

0		The following contributors helped to develop gfapy. Please drop a note to
1		gonnella@zbh.uni-hamburg.de if I left someone out or missed something.
2
3		- Tim Weber (translation of parts of the code from Ruby to Python)
4		- Stefan Kurtz (advises)
5

+0

-19

~~LICENSE.txt~~ less more

0		All code of gfapy is released under the following ISC license.
1		It is functionally equivalent to a two-term BSD copyright with
2		language removed that is made unnecessary by the Berne convention.
3		See http://openbsd.org/policy.html for more information on copyrights.
4
5		Copyright (c) 2017 Giorgio Gonnella and CONTRIBUTORS
6		Copyright (c) 2017 Center for Bioinformatics, University of Hamburg
7
8		Permission to use, copy, modify, and distribute this software for any
9		purpose with or without fee is hereby granted, provided that the above
10		copyright notice and this permission notice appear in all copies.
11
12		THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
13		WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
14		MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
15		ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
16		WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
17		ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
18		OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

+0

-51

~~Makefile~~ less more

0		default: tests
1
2		.PHONY: manual tests cleanup upload conda sdist wheel install
3
4		PYTHON=python3
5		PIP=pip3
6
7		# Install using pip
8		install:
9		${PIP} install --upgrade --user --editable .
10
11		# Source distribution
12		sdist:
13		${PYTHON} setup.py sdist
14
15		# Pure Python Wheel
16		wheel:
17		${PYTHON} setup.py bdist_wheel
18
19		# Create the manual
20		manual:
21		cd doc && make latexpdf
22		mkdir -p manual
23		cp doc/_build/latex/Gfapy.pdf manual/gfapy-manual.pdf
24
25		doctest:
26		cd doc && make doctest
27
28		unittests:
29		@echo
30		@echo "Running unit test suite..."
31		@PYTHONHASHSEED=0 ${PYTHON} -m unittest discover
32
33		tests: doctest unittests
34
35		# Remove distribution files
36		cleanup:
37		rm -rf dist/ build/ gfapy.egg-info/
38
39		upload: tests cleanup sdist wheel
40		cd dist; \
41		for file in *; do \
42		twine register $$file; \
43		twine upload $$file; \
44		done
45
46		conda:
47		mkdir -p conda
48		cd conda; \
49		conda skeleton pypi gfapy; \
50		conda build gfapy

+151

-0

PKG-INFO less more

	0	Metadata-Version: 1.1
	1	Name: gfapy
	2	Version: 1.2.0
	3	Summary: Library for handling data in the GFA1 and GFA2 formats
	4	Home-page: https://github.com/ggonnella/gfapy
	5	Author: Giorgio Gonnella and others (see CONTRIBUTORS)
	6	Author-email: gonnella@zbh.uni-hamburg.de
	7	License: ISC
	8	Description: Gfapy
	9	~~~~~
	10
	11	\|travis\| \|readthedocs\| \|latesttag\| \|license\| \|requiresio\|
	12
	13	\|bioconda\| \|pypi\| \|debian\| \|ubuntu\|
	14
	15	.. sphinx-begin
	16
	17	The Graphical Fragment Assembly (GFA) are formats for the representation
	18	of sequence graphs, including assembly, variation and splicing graphs.
	19	Two versions of GFA have been defined (GFA1 and GFA2) and several sequence
	20	analysis programs have been adopting the formats as an interchange format,
	21	which allow to easily combine different sequence analysis tools.
	22
	23	This library implements the GFA1 and GFA2 specification
	24	described at https://github.com/GFA-spec/GFA-spec/blob/master/GFA-spec.md.
	25	It allows to create a Gfa object from a file in the GFA format
	26	or from scratch, to enumerate the graph elements (segments, links,
	27	containments, paths and header lines), to traverse the graph (by
	28	traversing all links outgoing from or incoming to a segment), to search for
	29	elements (e.g. which links connect two segments) and to manipulate the
	30	graph (e.g. to eliminate a link or a segment or to duplicate a segment
	31	distributing the read counts evenly on the copies).
	32
	33	The GFA format can be easily extended by users by defining own custom
	34	tags and record types. In Gfapy, it is easy to write extensions modules,
	35	which allow to define custom record types and datatypes for the parsing
	36	and validation of custom fields. The custom lines can be connected, using
	37	references, to each other and to lines of the standard record types.
	38
	39	Requirements
	40	~~~~~~~~~~~~
	41
	42	Gfapy has been written for Python 3 and tested using Python version 3.7.
	43	It does not require any additional Python packages or other software.
	44
	45	Installation
	46	~~~~~~~~~~~~
	47
	48	Gfapy is distributed as a Python package and can be installed using
	49	the Python package manager pip, as well as conda (in the Bioconda channel).
	50	It is also available as a package in some Linux distributions (Debian, Ubuntu).
	51
	52	The following command installs the current stable version from the Python
	53	Packages index::
	54
	55	pip install gfapy
	56
	57	If you would like to install the current development version from Github,
	58	use the following command::
	59
	60	pip install -e git+https://github.com/ggonnella/gfapy.git#egg=gfapy
	61
	62	Alternatively it is possible to install gfapy using conda. Gfapy is
	63	included in the Bioconda (https://bioconda.github.io/) channel::
	64
	65	conda install -c bioconda gfapy
	66
	67	Usage
	68	~~~~~
	69
	70	If you installed gfapy as described above, you can import it in your script
	71	using the conventional Python syntax::
	72
	73	>>> import gfapy
	74
	75	Documentation
	76	~~~~~~~~~~~~~
	77
	78	The documentation, including this introduction to Gfapy, a user manual
	79	and the API documentation is hosted on the ReadTheDocs server,
	80	at the URL http://gfapy.readthedocs.io/en/latest/ and it can be
	81	downloaded as PDF from the URL
	82	https://github.com/ggonnella/gfapy/blob/master/manual/gfapy-manual.pdf.
	83
	84	References
	85	~~~~~~~~~~
	86
	87	Giorgio Gonnella and Stefan Kurtz "GfaPy: a flexible and extensible software
	88	library for handling sequence graphs in Python", Bioinformatics (2017) btx398
	89	https://doi.org/10.1093/bioinformatics/btx398
	90
	91	.. sphinx-end
	92
	93	.. \|travis\|
	94	image:: https://travis-ci.com/ggonnella/gfapy.svg?branch=master
	95	:target: https://travis-ci.com/ggonnella/gfapy
	96	:alt: Travis
	97
	98	.. \|latesttag\|
	99	image:: https://img.shields.io/github/v/tag/ggonnella/gfapy
	100	:target: https://github.com/ggonnella/gfapy/tags
	101	:alt: Latest GitHub tag
	102
	103	.. \|readthedocs\|
	104	image:: https://readthedocs.org/projects/pip/badge/?version=stable
	105	:target: https://pip.pypa.io/en/stable/?badge=stable
	106	:alt: ReadTheDocs
	107
	108	.. \|bioconda\|
	109	image:: https://img.shields.io/conda/vn/bioconda/gfapy
	110	:target: https://bioconda.github.io/recipes/gfapy/README.html
	111	:alt: Bioconda
	112
	113	.. \|pypi\|
	114	image:: https://img.shields.io/pypi/v/gfapy
	115	:target: https://pypi.org/project/gfapy/
	116	:alt: PyPI
	117
	118	.. \|debian\|
	119	image:: https://img.shields.io/debian/v/gfapy
	120	:target: https://packages.debian.org/search?keywords=gfapy
	121	:alt: Debian
	122
	123	.. \|ubuntu\|
	124	image:: https://img.shields.io/ubuntu/v/gfapy
	125	:target: https://packages.ubuntu.com/search?keywords=gfapy
	126	:alt: Ubuntu
	127
	128	.. \|license\|
	129	image:: https://img.shields.io/pypi/l/gfapy
	130	:target: https://github.com/ggonnella/gfapy/blob/master/LICENSE.txt
	131	:alt: ISC License
	132
	133	.. \|requiresio\|
	134	image:: https://requires.io/github/ggonnella/gfapy/requirements.svg?branch=master
	135	:target: https://requires.io/github/ggonnella/gfapy/requirements/?branch=master
	136	:alt: Requirements Status
	137
	138	Keywords: bioinformatics genomics sequences GFA assembly graphs
	139	Platform: UNKNOWN
	140	Classifier: Development Status :: 5 - Production/Stable
	141	Classifier: Environment :: Console
	142	Classifier: Intended Audience :: Developers
	143	Classifier: Intended Audience :: End Users/Desktop
	144	Classifier: Intended Audience :: Science/Research
	145	Classifier: License :: OSI Approved :: ISC License (ISCL)
	146	Classifier: Operating System :: MacOS :: MacOS X
	147	Classifier: Operating System :: POSIX :: Linux
	148	Classifier: Programming Language :: Python :: 3 :: Only
	149	Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
	150	Classifier: Topic :: Software Development :: Libraries

+58

-1

README.rst less more

	0	Gfapy
	1	~~~~~
	2
	3	\|travis\| \|readthedocs\| \|latesttag\| \|license\| \|requiresio\|
	4
	5	\|bioconda\| \|pypi\| \|debian\| \|ubuntu\|
	6
	7	.. sphinx-begin
	8
0	9	The Graphical Fragment Assembly (GFA) are formats for the representation
1	10	of sequence graphs, including assembly, variation and splicing graphs.
2	11	Two versions of GFA have been defined (GFA1 and GFA2) and several sequence

22	31	Requirements
23	32	~~~~~~~~~~~~
24	33
25		Gfapy has been written for Python 3 and tested using Python version 3.3.
	34	Gfapy has been written for Python 3 and tested using Python version 3.7.
26	35	It does not require any additional Python packages or other software.
27	36
28	37	Installation

30	39
31	40	Gfapy is distributed as a Python package and can be installed using
32	41	the Python package manager pip, as well as conda (in the Bioconda channel).
	42	It is also available as a package in some Linux distributions (Debian, Ubuntu).
33	43
34	44	The following command installs the current stable version from the Python
35	45	Packages index::

69	79	Giorgio Gonnella and Stefan Kurtz "GfaPy: a flexible and extensible software
70	80	library for handling sequence graphs in Python", Bioinformatics (2017) btx398
71	81	https://doi.org/10.1093/bioinformatics/btx398
	82
	83	.. sphinx-end
	84
	85	.. \|travis\|
	86	image:: https://travis-ci.com/ggonnella/gfapy.svg?branch=master
	87	:target: https://travis-ci.com/ggonnella/gfapy
	88	:alt: Travis
	89
	90	.. \|latesttag\|
	91	image:: https://img.shields.io/github/v/tag/ggonnella/gfapy
	92	:target: https://github.com/ggonnella/gfapy/tags
	93	:alt: Latest GitHub tag
	94
	95	.. \|readthedocs\|
	96	image:: https://readthedocs.org/projects/pip/badge/?version=stable
	97	:target: https://pip.pypa.io/en/stable/?badge=stable
	98	:alt: ReadTheDocs
	99
	100	.. \|bioconda\|
	101	image:: https://img.shields.io/conda/vn/bioconda/gfapy
	102	:target: https://bioconda.github.io/recipes/gfapy/README.html
	103	:alt: Bioconda
	104
	105	.. \|pypi\|
	106	image:: https://img.shields.io/pypi/v/gfapy
	107	:target: https://pypi.org/project/gfapy/
	108	:alt: PyPI
	109
	110	.. \|debian\|
	111	image:: https://img.shields.io/debian/v/gfapy
	112	:target: https://packages.debian.org/search?keywords=gfapy
	113	:alt: Debian
	114
	115	.. \|ubuntu\|
	116	image:: https://img.shields.io/ubuntu/v/gfapy
	117	:target: https://packages.ubuntu.com/search?keywords=gfapy
	118	:alt: Ubuntu
	119
	120	.. \|license\|
	121	image:: https://img.shields.io/pypi/l/gfapy
	122	:target: https://github.com/ggonnella/gfapy/blob/master/LICENSE.txt
	123	:alt: ISC License
	124
	125	.. \|requiresio\|
	126	image:: https://requires.io/github/ggonnella/gfapy/requirements.svg?branch=master
	127	:target: https://requires.io/github/ggonnella/gfapy/requirements/?branch=master
	128	:alt: Requirements Status

+0

-3

~~benchmarks/.gitignore~~ less more

0		benchmark_results*
1		jobs_out
2		figure*

+0

-65

~~benchmarks/gfapy-benchmark-collectdata~~ less more

0		#!/bin/bash
1
2		#
3		# This script is derived from rdj-spacepeak.sh in
4		# the GenomeTools repository (www.genometools.org).
5		#
6		# (c) 2010-2017 Giorgio Gonnella, ZBH, University of Hamburg
7		#
8
9		sleeptime=0.1
10
11		if [ $# -eq 0 ]; then
12		echo "Usage: $0 <command> [args]"
13		echo
14		echo "The following information is polled each $sleeptime seconds"
15		echo "from /proc/[pid]/status:"
16		echo
17		echo " VmPeak: Peak virtual memory size."
18		echo " VmSize: Virtual memory size."
19		echo " VmLck: Locked memory size."
20		echo " VmHWM: Peak resident set size (\"high water mark\")."
21		echo " VmRSS: Resident set size."
22		echo " VmData, VmStk, VmExe: Size of data, stack, and text segments."
23		echo " VmLib: Shared library code size."
24		echo " VmPTE: Page table entries size (since Linux 2.6.10)."
25		echo
26		echo "The command is run under /usr/bin/time."
27		exit
28		fi
29
30		# code inspired by:
31		# http://stackoverflow.com/questions/1080461/
32		# /peak-memory-measurement-of-long-running-process-in-linux
33		function __measure_space_peak {
34		types="Peak Size Lck HWM RSS Data Stk Exe Lib PTE"
35		declare -A maxVm
36		for vm in $types; do maxVm[$vm]=0; done
37		ppid=$$
38		/usr/bin/time $@ &
39		tpid=`pgrep -P ${ppid} -n -f time`
40		if [[ ${tpid} -ne "" ]]; then
41		pid=`pgrep -P ${tpid} -n -f $1` # $! may work here but not later
42		fi
43		declare -A Vm
44		while [[ ${tpid} -ne "" ]]; do
45		for vm in $types; do
46		if [[ ${pid} -ne "" ]]; then
47		Vm[$vm]=`cat /proc/${pid}/status 2> /dev/null \
48		\| grep Vm${vm} \| awk '{print $2}'`
49		if [[ ${Vm[$vm]} -gt ${maxVm[$vm]} ]]; then
50		maxVm[$vm]=${Vm[$vm]}
51		fi
52		fi
53		done
54		sleep $sleeptime
55		savedtpid=${tpid}
56		tpid=`pgrep -P ${ppid} -n -f time`
57		done
58		wait ${savedtpid} # don't wait, job is finished
59		exitstatus=$? # catch the exit status of wait, the same of $@
60		echo "Memory usage for $@:" >> /dev/stderr
61		for vm in $types; do echo " Vm$vm: ${maxVm[$vm]} kB" >> /dev/stderr; done
62		echo "Exit status: ${exitstatus}" >> /dev/stderr
63		}
64		__measure_space_peak $*

+0

-120

~~benchmarks/gfapy-plot-benchmarkdata.R~~ less more

0		#!/usr/bin/env Rscript
1		# (c) Giorgio Gonnella, ZBH, Uni Hamburg, 2017
2
3		script.name = "./gfapy-plot-benchmarkdata.R"
4		args <- commandArgs(trailingOnly=TRUE)
5		if (is.na(args[3])) {
6		cat("Usage: ",script.name, " <inputfile> <outpfx> <variable>", "\n")
7		cat("variable: either 'segments' or 'connectivity'\n")
8		stop("Too few command-line parameters")
9		}
10		infname <- args[1]
11		cat("input data: ",infname,"\n")
12		outpfx <- args[2]
13		cat("output prefix:", outpfx, "\n")
14		xvar <- args[3]
15		if (xvar != 'segments' && xvar != 'connectivity') {
16		stop("variable must be one of: segments, connectivity")
17		}
18
19		library("ggplot2")
20
21		#
22		# The following function is described here:
23		# http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper%20functions
24		# Licence: CC0 (https://creativecommons.org/publicdomain/zero/1.0/)
25		#
26		## Gives count, mean, standard deviation, standard error of the mean, and
27		## confidence interval (default 95%).
28		## data: a data frame.
29		## measurevar: the name of a column that contains the var to be summariezed
30		## groupvars: a vector containing names of columns that contain grouping vars
31		## na.rm: a boolean that indicates whether to ignore NA's
32		## conf.interval: the percent range of the confidence interval (default 95%)
33		summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
34		conf.interval=.95, .drop=TRUE) {
35		library(plyr)
36
37		# New version of length which can handle NA's: if na.rm==T, don't count them
38		length2 <- function (x, na.rm=FALSE) {
39		if (na.rm) sum(!is.na(x))
40		else length(x)
41		}
42
43		# This does the summary. For each group's data frame, return a vector with
44		# N, mean, and sd
45		datac <- ddply(data, groupvars, .drop=.drop,
46		.fun = function(xx, col) {
47		c(N = length2(xx[[col]], na.rm=na.rm),
48		mean = mean (xx[[col]], na.rm=na.rm),
49		sd = sd (xx[[col]], na.rm=na.rm)
50		)
51		},
52		measurevar
53		)
54
55		# Rename the "mean" column
56		datac <- rename(datac, c("mean" = measurevar))
57
58		datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
59
60		# Confidence interval multiplier for standard error
61		# Calculate t-statistic for confidence interval:
62		# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
63		ciMult <- qt(conf.interval/2 + .5, datac$N-1)
64		datac$ci <- datac$se * ciMult
65
66		return(datac)
67		}
68
69		data <- read.table(infname, header=T, sep="\t")
70
71		if (xvar == "segments") {
72		xvarname = "lines"
73		xlab="Lines (segments 1/3; dovetails 2/3)"
74		} else {
75		xvarname = "mult"
76		xlab="Dovetails/segment (segments=4000)"
77		data[c("lines")] = (data[c("mult")]+1)*4000
78		}
79
80		time.data <- summarySE(data, measurevar="time", groupvars=c(xvarname))
81		outfname = paste0(outpfx,"_time.log")
82		sink(outfname)
83		print(time.data)
84		time.lm <- lm(time ~ lines, data=data)
85		summary(time.lm)
86		time.nls <- nls(time ~ b + a * lines,
87		data=data, start=list(a=0,b=0),
88		algorithm="port", lower=c(0,0))
89		print(time.nls)
90		sink()
91
92		outfname = paste0(outpfx,"_space.log")
93		sink(outfname)
94		space.data <- summarySE(data, measurevar="space", groupvars=c(xvarname))
95		print(space.data)
96		space.lm <- lm(space ~ lines, data=data)
97		summary(space.lm)
98		space.nls <- nls(space ~ b + a * lines,
99		data=data, start=list(a=0,b=0),
100		algorithm="port", lower=c(0,0))
101		print(space.nls)
102		sink()
103
104		outfname = paste0(outpfx,"_time.pdf")
105		pdf(outfname)
106		print(ggplot(time.data, aes_string(x=xvarname, y="time")) +
107		geom_errorbar(aes(ymin=time-se, ymax=time+se), width=2) +
108		geom_line(size=0.2) + geom_point(size=3) +
109		ylab("Total elapsed time (s)") +
110		xlab(xlab))
111		outfname = paste0(outpfx,"_space.pdf")
112		pdf(outfname)
113		print(ggplot(space.data, aes_string(x=xvarname, y="space")) +
114		geom_errorbar(aes(ymin=space-se, ymax=space+se), width=2) +
115		geom_line(size=0.2) + geom_point(size=3) +
116		ylab("Memory peak (MB)") +
117		xlab(xlab))
118		dev.off()
119

+0

-62

~~benchmarks/gfapy-plot-preparedata.py~~ less more

0		#!/usr/bin/env python3
1		"""
2		Prepare the output of the convert benchmark script for the R plotting script.
3		"""
4
5		import argparse
6		import os
7		import sys
8		import re
9
10		op = argparse.ArgumentParser(description=__doc__)
11		op.add_argument('--version', action='version', version='%(prog)s 1.0')
12		op.add_argument("--mult", "-m", action="store_true",
13		help="set if variable n of edges/segment")
14		op.add_argument("inputfile")
15		opts = op.parse_args()
16
17		if not os.path.exists(opts.inputfile):
18		sys.stderr.write("Input file not found: {}\n".format(opts.inputfile))
19		exit(1)
20
21		with open(opts.inputfile) as inputfile:
22		header = True
23		if opts.mult:
24		outdata = ["mult", "time", "space", "time_per_line", "space_per_line"]
25		else:
26		outdata = ["lines", "time", "space", "time_per_line", "space_per_line"]
27		print("\t".join(outdata))
28		for line in inputfile:
29		if line[:3] == "###":
30		header = False
31		elif not header:
32		data = line.rstrip("\n\r").split("\t")
33		n_segments = data[2]
34		multiplier = data[3]
35		n_lines = int(int(n_segments) * (1+float(multiplier)))
36		elapsed = data[5]
37		elapsed_match = re.compile(r'\s+(\d+):(\d+\.\d+)').match(elapsed)
38		if elapsed_match:
39		minutes = int(elapsed_match.groups()[0])
40		seconds = float(elapsed_match.groups()[1])
41		seconds += minutes * 60
42		else:
43		elapsed_match = re.compile(r'\s+(\d+):(\d+):(\d+)').match(elapsed)
44		if elapsed_match:
45		hours = int(elapsed_match.groups()[0])
46		minutes = int(elapsed_match.groups()[1])
47		seconds = int(elapsed_match.groups()[2])
48		minutes += hours * 60
49		seconds += minutes * 60
50		else:
51		continue
52		memory = data[6]
53		memory = int(re.compile(r'(\d+) kB').match(memory).groups()[0])
54		megabytes = memory / 1024
55		if opts.mult:
56		outdata = [str(multiplier)]
57		else:
58		outdata = [str(n_lines)]
59		outdata += [str(seconds),str(megabytes),
60		str(seconds/n_lines), str(megabytes/n_lines)]
61		print("\t".join(outdata))

+0

-61

~~benchmarks/gfapy-profiler.sh~~ less more

0		#!/bin/bash
1		#$ -clear
2		#$ -q 16c.q
3		#$ -cwd
4		#$ -V
5		#$ -S /bin/bash
6		#$ -o jobs_out
7		#$ -j y
8
9		if [ $# -ne 4 ]; then
10		echo "Usage: $0 <operation> <version> <variable> <range>" > /dev/stderr
11		echo " operation: (mergelinear/convert) ../bin/gfapy-<operation> <gfafile> will be called" > /dev/stderr
12		echo " version: (gfa1/gfa2) gfa version" > /dev/stderr
13		echo " variable: (segments/connectivity)" > /dev/stderr
14		echo " range: (all/fast/slow)" > /dev/stderr
15		exit 1
16		fi
17
18		operation=$1
19		version=$2
20		variable=$3
21		range=$4
22
23		if [ $variable == "segments" ]; then
24		if [ $range == "fast" ]; then
25		nsegments="1000 2000 4000 8000 16000 32000 64000 128000"
26		elif [ $range == "slow" ]; then
27		nsegments="256000 512000 1024000 2048000 4096000"
28		elif [ $range == "all"]; then
29		nsegments="1000 2000 4000 8000 16000 32000 64000 128000 256000 512000 1024000 2048000 4096000"
30		fi
31		else
32		nsegments=4000
33		fi
34
35		if [ $variable == "connectivity" ]; then
36		if [ $range == "fast" ]; then
37		multipliers="2 4 8 16 32 64"
38		elif [ $range == "slow" ]; then
39		multipliers="128 256"
40		elif [ $range == "all"]; then
41		multipliers="2 4 8 16 32 64 128 256"
42		fi
43		else
44		multipliers=2
45		fi
46
47		replicate=1
48		for i in $nsegments; do
49		for m in $multipliers; do
50		fname="${i}_e${m}x.$replicate.${version}"
51		if [ ! -e $fname ]; then
52		./gfapy-randomgraph --segments $i -g $version \
53		--dovetails-per-segment $m --with-sequence > $fname
54		fi
55		echo "Profiling $operation $fname ..."
56		rm -f $fname.$operation.prof
57		python3 -m cProfile -o $fname.$operation.prof \
58		../bin/gfapy-$operation $fname 1> /dev/null
59		done
60		done

+0

-87

~~benchmarks/gfapy-randomgraph~~ less more

0		#!/usr/bin/env python3
1		"""
2		Creates a random graph for testing
3		"""
4
5		import argparse
6		import sys
7		import random
8
9		op = argparse.ArgumentParser(description=__doc__)
10		op.add_argument("--segments", "-s", type=int,
11		help="number of segments", required=True)
12		op.add_argument("--slen", "-l", type=int, default=100,
13		help="lenght of segments sequence")
14		op.add_argument("--with-sequence", "-w", action="store_true")
15		op.add_argument("--dovetails-per-segment", "-d",
16		help="average number of dovetail edges per segment",
17		default=2.0, type=float)
18		op.add_argument('--gfa-version', "-g", default="gfa1",
19		help="gfa version", choices=("gfa1", "gfa2"))
20		op.add_argument('--version', action='version', version='%(prog)s 1.0')
21		opts = op.parse_args()
22
23		if opts.segments < 0:
24		sys.stderr.write("Error: the number of segments must be "+
25		">= 0 ({})\n".format(opts.segments))
26		exit(1)
27		if opts.dovetails_per_segment < 0:
28		sys.stderr.write("Error: the average number of dovetails per segment must "+
29		"be >= 0 ({})\n".format(opts.dovetails_per_segment))
30		exit(1)
31		if opts.slen <= 0:
32		sys.stderr.write("Error: the length of segments sequence must be > 0"+
33		" ({})\n".format(opts.slen))
34		exit(1)
35
36		if opts.gfa_version == "gfa1":
37		print("H\tVN:Z:1.0")
38		else:
39		print("H\tVN:Z:2.0")
40
41		def random_sequence(slen):
42		sequence = []
43		for i in range(slen):
44		sequence.append(random.choice('ACGT'))
45		return "".join(sequence)
46
47		for i in range(opts.segments):
48		if opts.with_sequence:
49		sequence = random_sequence(opts.slen)
50		else:
51		sequence = "*"
52		if opts.gfa_version == "gfa1":
53		print("S\ts{}\t{}\tLN:i:{}".format(i, sequence, opts.slen))
54		else:
55		print("S\ts{}\t{}\t{}".format(i, opts.slen, sequence))
56
57		n_dovetails = int(opts.segments * opts.dovetails_per_segment)
58		edges = {}
59		for i in range(n_dovetails):
60		edge = False
61		while not edge:
62		s_from = random.randint(0, opts.segments-1)
63		s_from_or = random.choice('+-')
64		s_to = random.randint(0, opts.segments-1)
65		s_to_or = random.choice('+-')
66		if s_from not in edges:
67		edges[s_from] = {'+': {}, '-': {}}
68		if s_to not in edges[s_from][s_from_or]:
69		edges[s_from][s_from_or][s_to] = {'+': False, '-': False}
70		if not edges[s_from][s_from_or][s_to][s_to_or]:
71		edges[s_from][s_from_or][s_to][s_to_or] = True
72		edge = True
73		ovlen = opts.slen//10
74		if ovlen == 0: ovlen = 1
75		cigar = "{}M".format(ovlen)
76		if opts.gfa_version == "gfa1":
77		print("L\ts{}\t{}\ts{}\t{}\t{}\tID:Z:e{}".format(s_from, s_from_or, s_to,
78		s_to_or, cigar, i))
79		else:
80		s_from_begin = opts.slen - ovlen if s_from_or == "+" else 0
81		s_from_end = "{}$".format(opts.slen) if s_from_or == "+" else ovlen
82		s_to_begin = opts.slen - ovlen if s_to_or == "-" else 0
83		s_to_end = "{}$".format(opts.slen) if s_to_or == "-" else ovlen
84		print("E\te{}\ts{}{}\ts{}{}\t{}\t{}\t{}\t{}\t{}".format(
85		i, s_from, s_from_or, s_to, s_to_or, s_from_begin, s_from_end,
86		s_to_begin, s_to_end, cigar))

+0

-76

~~benchmarks/gfapy-reproduce-manuscript-figure.py~~ less more

0		#!/usr/bin/env python3
1		"""
2		Run the benchmarks necessary to reproduce the figures of Section 3
3		of the Supplementary Information of the manuscript \"Gfapy: a flexible
4		and extensible software library for handling sequence graphs in Python\"
5		and plots the figures using R.
6		"""
7
8		import argparse
9		import os
10
11		op = argparse.ArgumentParser(description=__doc__)
12		op.add_argument("fignum", help="Figure number", type=int,
13		choices=range(5,9))
14		op.add_argument("--queue", default=None,
15		help="Use the specified queue of a Grid Engine cluster system "+
16		"(e.g. 16c.q). If not provided, the benchmarks are run on the "+
17		"local computer.")
18		op.add_argument("--nrepl",type=int, default=3,
19		help="Number of replicates (default: 3)")
20		op.add_argument("--fast",action="store_true",
21		help="Run only the three fastest datapoints of the benchmark")
22		opts = op.parse_args()
23
24		if opts.fignum == 5:
25		testvar="segments"
26		operation="convert"
27		elif opts.fignum == 6:
28		testvar="connectivity"
29		operation="convert"
30		elif opts.fignum == 7:
31		testvar="segments"
32		operation="mergelinear"
33		else: # 8
34		testvar="connectivity"
35		operation="mergelinear"
36
37		if opts.fast:
38		subset="fast"
39		else:
40		subset="all"
41
42		run_benchmarks_args="figure{}.out {} gfa2 {} {} {}".format(
43		opts.fignum, operation, testvar, subset, opts.nrepl)
44
45		if not opts.queue:
46		os.system("./gfapy-run-benchmarks.sh {}".format(run_benchmarks_args))
47		else:
48		qsub_script_pfx=\
49		"""#!/bin/bash
50		#$ -clear
51		#$ -q {}
52		#$ -cwd
53		#$ -V
54		#$ -S /bin/bash
55		#$ -o jobs_out
56		#$ -j y
57		#$ -sync y
58
59		""".format(opts.queue)
60		with open("gfapy-run-benchmarks.sh", "r") as input_file:
61		content = input_file.read()
62		with open("gfapy-run-benchmarks.qsub", "w") as output_file:
63		output_file.write(qsub_script_pfx)
64		output_file.write(content)
65		os.system("mkdir -p jobs_out")
66		os.system("qsub gfapy-run-benchmarks.qsub {}".format(run_benchmarks_args))
67
68		if testvar == "segments":
69		prepareflag=""
70		else:
71		prepareflag="--mult"
72		os.system("./gfapy-plot-preparedata.py {} figure{}.out > figure{}.dat".format(
73		prepareflag, opts.fignum, opts.fignum))
74		os.system("./gfapy-plot-benchmarkdata.R figure{}.dat figure{} {}".format(
75		opts.fignum, opts.fignum, testvar))

+0

-67

~~benchmarks/gfapy-run-benchmarks.sh~~ less more

0		#!/bin/bash
1
2		if [ $# -ne 6 ]; then
3		echo "Usage: $0 <outfile> <operation> <version> <variable> <range> <nrepl>" > /dev/stderr
4		echo " outfile: will be overwritten if exists" > /dev/stderr
5		echo " operation: (mergelinear/convert) ../bin/gfapy-<operation> <gfafile> will be called" > /dev/stderr
6		echo " version: (gfa1/gfa2) gfa version" > /dev/stderr
7		echo " variable: (segments/connectivity)" > /dev/stderr
8		echo " range: (all/fast/slow)" > /dev/stderr
9		echo " nrepl: (e.g. 3) number of replicates" > /dev/stderr
10		exit 1
11		fi
12
13		outfile=$1
14		operation=$2
15		version=$3
16		variable=$4
17		range=$5
18		nrepl=$6
19
20		if [ $variable == "segments" ]; then
21		if [ $range == "fast" ]; then
22		nsegments="1000 2000 4000"
23		elif [ $range == "slow" ]; then
24		nsegments="8000 16000 32000 64000 128000 256000 512000 1024000 2048000"
25		elif [ $range == "all"]; then
26		nsegments="1000 2000 4000 8000 16000 32000 64000 128000 256000 512000 1024000 2048000"
27		fi
28		else
29		nsegments=4000
30		fi
31
32		if [ $variable == "connectivity" ]; then
33		if [ $range == "fast" ]; then
34		multipliers="2 4 8"
35		elif [ $range == "slow" ]; then
36		multipliers="16 32 64 128 256"
37		elif [ $range == "all"]; then
38		multipliers="2 4 8 16 32 64 128 256"
39		fi
40		else
41		multipliers=2
42		fi
43
44		mkdir -p benchmark_results
45		rm -f $outfile
46		echo "# hostname: $HOSTNAME" > $outfile
47		echo "### benchmark data:" >> $outfile
48		for ((replicate=1;replicate<=nrepl;++replicate)); do
49		for i in $nsegments; do
50		for m in $multipliers; do
51		fname="benchmark_results/${i}_e${m}x.$replicate.${version}"
52		bmout="$fname.$operation.benchmark"
53		rm -f $bmout
54		if [ ! -e $fname ]; then
55		./gfapy-randomgraph --segments $i -g $version \
56		--dovetails-per-segment $m --with-sequence > $fname
57		fi
58		./gfapy-benchmark-collectdata ../bin/gfapy-$operation $fname \
59		1> /dev/null 2> $bmout
60		elapsed=$(grep -P -o "(?<=) [^ ]*(?=elapsed)" $bmout)
61		memory=$(grep -P -o "(?<=VmHWM: ).*" $bmout)
62		filesize=( $(ls -ln $fname) );filesize=${filesize[4]}
63		echo -e "gfapy-$operation\t$version\t$i\t$m\t$replicate\t$elapsed\t$memory\t$filesize" >> $outfile
64		done
65		done
66		done

+0

-47

~~bin/gfapy-diff~~ less more

0		#!/usr/bin/env python3
1		"""
2		Compare two GFA files
3
4		Note: the current version is not yet functional and only checking segments.
5		Work in progress.
6		"""
7
8		import sys
9		import os
10		import gfapy
11		import argparse
12
13		op = argparse.ArgumentParser(description=__doc__)
14		op.add_argument('--version', action='version', version='%(prog)s 0.1')
15		op.add_argument("filename1")
16		op.add_argument("filename2")
17		opts = op.parse_args()
18
19		gfa1 = gfapy.Gfa.from_file(opts.filename1)
20		gfa2 = gfapy.Gfa.from_file(opts.filename2)
21
22		different = False
23
24		if gfa1.version != gfa2.version:
25		print("# different version")
26		exit(1)
27		else:
28		for s in gfa1.segments:
29		s2 = gfa2.segment(s)
30		if s2 is None:
31		different = True
32		print("# segment {} in {} but not in {}".format(s.name, opts.filename1, opts.filename2))
33		if s.diff(s2):
34		different = True
35		for diff in s.diff(s2):
36		print(diff)
37		for s in gfa2.segments:
38		s1 = gfa1.segment(s)
39		if s1 is None:
40		different = True
41		print("# segment {} in {} but not in {}".format(s.name, opts.filename2, opts.filename1))
42
43		if different:
44		exit(1)
45		else:
46		exit(0)

+0

-2

~~doc/.gitignore~~ less more

0		source
1		_build

+0

-23

~~doc/Makefile~~ less more

0		# Minimal makefile for Sphinx documentation
1		#
2
3		# You can set these variables from the command line.
4		SPHINXOPTS =
5		SPHINXBUILD = sphinx-build
6		SPHINXPROJ = Gfapy
7		SOURCEDIR = .
8		BUILDDIR = _build
9
10		# Put it first so that "make" without argument is like "make help".
11		help:
12		@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
13
14		.PHONY: help Makefile
15
16		cleanup:
17		rm source _build -rf
18
19		# Catch-all target: route all unknown targets to Sphinx using the new
20		# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
21		%: Makefile
22		@PYTHONHASHSEED=0 $(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

+0

-4

~~doc/changelog.rst~~ less more

0		Changelog
1		---------
2		.. include:: ../CHANGES.txt
3		:literal:

+0

-173

~~doc/conf.py~~ less more

0		#!/usr/bin/env python3
1		# -- coding: utf-8 --
2		#
3		# Gfapy documentation build configuration file, created by
4		# sphinx-quickstart on Thu Mar 16 10:13:57 2017.
5		#
6		# This file is execfile()d with the current directory set to its
7		# containing dir.
8		#
9		# Note that not all possible configuration values are present in this
10		# autogenerated file.
11		#
12		# All configuration values have a default; values that are commented out
13		# serve to show the default.
14
15		# If extensions (or modules to document with autodoc) are in another directory,
16		# add these directories to sys.path here. If the directory is relative to the
17		# documentation root, use os.path.abspath to make it absolute, like shown here.
18		#
19		import os
20		import sys
21		sys.path.insert(0, os.path.abspath('.'))
22		sys.path.insert(0, os.path.abspath('../'))
23
24		# -- General configuration ------------------------------------------------
25
26		# If your documentation needs a minimal Sphinx version, state it here.
27		#
28		# needs_sphinx = '1.0'
29
30		# Add any Sphinx extension module names here, as strings. They can be
31		# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
32		# ones.
33		extensions = [
34		'sphinx.ext.autodoc',
35		'sphinx.ext.doctest',
36		'sphinx.ext.todo',
37		'sphinx.ext.coverage',
38		'sphinx.ext.imgmath',
39		'sphinx.ext.ifconfig',
40		'sphinx.ext.viewcode',
41		'sphinx.ext.githubpages',
42		'sphinx.ext.napoleon'
43		]
44
45		# Napoleon
46		napoleon_numpy_docstring = True
47		napoleon_google_docstring = True
48		napoleon_use_param = False
49		napoleon_use_ivar = True
50
51		# Default role:
52		default_role = 'any'
53
54		# Add any paths that contain templates here, relative to this directory.
55		templates_path = ['_templates']
56
57		# The suffix(es) of source filenames.
58		# You can specify multiple suffix as a list of string:
59		#
60		# source_suffix = ['.rst', '.md']
61		source_suffix = '.rst'
62
63		# The master toctree document.
64		master_doc = 'index'
65
66		# General information about the project.
67		project = 'Gfapy'
68		copyright = '2017, Giorgio Gonnella and others (see CONTRIBUTORS)'
69		author = 'Giorgio Gonnella and others (see CONTRIBUTORS)'
70
71		# The version info for the project you're documenting, acts as replacement for
72		# \|version\| and \|release\|, also used in various other places throughout the
73		# built documents.
74		#
75		# The short X.Y version.
76		version = '1.1'
77		# The full version, including alpha/beta/rc tags.
78		release = '1.1.0'
79
80		# The language for content autogenerated by Sphinx. Refer to documentation
81		# for a list of supported languages.
82		#
83		# This is also used if you do content translation via gettext catalogs.
84		# Usually you set "language" from the command line for these cases.
85		language = None
86
87		# List of patterns, relative to source directory, that match files and
88		# directories to ignore when looking for source files.
89		# This patterns also effect to html_static_path and html_extra_path
90		exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
91
92		# The name of the Pygments (syntax highlighting) style to use.
93		pygments_style = 'sphinx'
94
95		# If true, `todo` and `todoList` produce output, else they produce nothing.
96		todo_include_todos = False
97
98		# -- Options for HTML output ----------------------------------------------
99
100		# The theme to use for HTML and HTML Help pages. See the documentation for
101		# a list of builtin themes.
102		#
103		html_theme = 'sphinx_rtd_theme'
104
105		# Theme options are theme-specific and customize the look and feel of a theme
106		# further. For a list of options available for each theme, see the
107		# documentation.
108		#
109		# html_theme_options = {}
110
111		# Add any paths that contain custom static files (such as style sheets) here,
112		# relative to this directory. They are copied after the builtin static files,
113		# so a file named "default.css" will overwrite the builtin "default.css".
114		html_static_path = ['_static']
115
116
117		# -- Options for HTMLHelp output ------------------------------------------
118
119		# Output file base name for HTML help builder.
120		htmlhelp_basename = 'Gfapydoc'
121
122
123		# -- Options for LaTeX output ---------------------------------------------
124
125		latex_elements = {
126		# The paper size ('letterpaper' or 'a4paper').
127		#
128		'papersize': 'a4paper',
129
130		# The font size ('10pt', '11pt' or '12pt').
131		#
132		# 'pointsize': '10pt',
133
134		# Additional stuff for the LaTeX preamble.
135		#
136		# 'preamble': '',
137
138		# Latex figure (float) alignment
139		#
140		# 'figure_align': 'htbp',
141		}
142
143		# Grouping the document tree into LaTeX files. List of tuples
144		# (source start file, target name, title,
145		# author, documentclass [howto, manual, or own class]).
146		latex_documents = [
147		(master_doc, 'Gfapy.tex', 'Gfapy Documentation',
148		'Giorgio Gonnella', 'manual'),
149		]
150
151		# -- Options for manual page output ---------------------------------------
152
153		# One entry per manual page. List of tuples
154		# (source start file, name, description, authors, manual section).
155		man_pages = [
156		(master_doc, 'gfapy', 'Gfapy Documentation',
157		[author], 1)
158		]
159
160
161		# -- Options for Texinfo output -------------------------------------------
162
163		# Grouping the document tree into Texinfo files. List of tuples
164		# (source start file, target name, title, author,
165		# dir menu entry, description, category)
166		texinfo_documents = [
167		(master_doc, 'Gfapy', 'Gfapy Documentation',
168		author, 'Gfapy',
169		'Python library for the Graphic Fragment Assembly (GFA) format.',
170		'Miscellaneous'),
171		]
172

+0

-36

~~doc/index.rst~~ less more

0		.. Gfapy documentation master file, created by
1		sphinx-quickstart on Thu Mar 16 10:13:57 2017.
2		You can adapt this file completely to your liking, but it should at least
3		contain the root `toctree` directive.
4
5		Gfapy documentation
6		===================
7
8		.. toctree::
9		:maxdepth: 2
10		:caption: Contents:
11
12		readme
13		changelog
14
15		tutorial/gfa
16		tutorial/validation
17		tutorial/positional_fields
18		tutorial/placeholders
19		tutorial/positions
20		tutorial/alignments
21		tutorial/tags
22		tutorial/references
23		tutorial/header
24		tutorial/custom_records
25		tutorial/comments
26		tutorial/errors
27		tutorial/graph_operations
28		tutorial/rgfa
29
30		Indices and tables
31		==================
32
33		* :ref:`genindex`
34		* :ref:`modindex`
35		* :ref:`search`

+0

-3

~~doc/readme.rst~~ less more

0		Introduction
1		============
2		.. include:: ../README.rst

+0

-1

~~doc/run_apidoc.sh~~ less more

0

sphinx-apidoc -o source/ ../gfapy

+0

-238

~~doc/tutorial/alignments.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		from gfapy import is_placeholder, Alignment
4		h = "H\tVN:Z:2.0\tTS:i:100"
5		sA = "S\tA\t100\t*"
6		sB = "S\tB\t100\t*"
7		x = "E\tx\tA+\tB-\t0\t100$\t0\t100$\t4,2\tTS:i:50"
8		gfa = gfapy.Gfa([h, sA, sB, x])
9
10		.. _alignments:
11
12		Alignments
13		~~~~~~~~~~
14
15		Some GFA1 (L/C overlap, P overlaps) and GFA2 (E/F alignment) fields contain
16		alignments or lists of alignments. The alignment can be left unspecified and a
17		placeholder symbol ``*`` used instead. In GFA1 the alignments can be given as
18		CIGAR strings, in GFA2 also as Dazzler traces.
19
20		Gfapy uses three different classes for representing the content of alignment fields:
21		:class:`~gfapy.alignment.cigar.CIGAR`, :class:`~gfapy.alignment.trace.Trace`
22		and :class:`~gfapy.alignment.placeholder.AlignmentPlaceholder`.
23
24		Creating an alignment
25		^^^^^^^^^^^^^^^^^^^^^
26
27		An alignment instance is usually created from its GFA string
28		representation or from a list by using the
29		:class:`gfapy.Alignment() <gfapy.alignment.alignment.Alignment>`
30		constructor.
31
32		.. doctest::
33
34		>>> from gfapy import Alignment
35		>>> Alignment("*")
36		gfapy.AlignmentPlaceholder()
37		>>> Alignment("10,10,10")
38		gfapy.Trace([10,10,10])
39		>>> Alignment([10,10,10])
40		gfapy.Trace([10,10,10])
41		>>> Alignment("30M2I")
42		gfapy.CIGAR([gfapy.CIGAR.Operation(30,'M'), gfapy.CIGAR.Operation(2,'I')])
43
44		If the argument is an alignment object it will be returned,
45		so that is always safe to call the method on a variable which can
46		contain a string or an alignment instance:
47
48		.. doctest::
49
50		>>> Alignment(Alignment("*"))
51		gfapy.AlignmentPlaceholder()
52		>>> Alignment(Alignment("10,10"))
53		gfapy.Trace([10,10])
54
55		Recognizing undefined alignments
56		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
57
58		The :func:`gfapy.is_placeholder() <gfapy.placeholder.is_placeholder>` method
59		allows to test if an alignment field contains an undefined value (placeholder)
60		instead of a defined value (CIGAR string, trace). The method accepts as
61		argument either an alignment object or a string or list representation.
62
63		.. doctest::
64
65		>>> from gfapy import is_placeholder, Alignment
66		>>> is_placeholder(Alignment("30M"))
67		False
68		>>> is_placeholder(Alignment("10,10"))
69		False
70		>>> is_placeholder(Alignment("*"))
71		True
72		>>> is_placeholder("*")
73		True
74		>>> is_placeholder("30M")
75		False
76		>>> is_placeholder("10,10")
77		False
78		>>> is_placeholder([])
79		True
80		>>> is_placeholder([10,10])
81		False
82
83		Note that, as a placeholder is ``False`` in boolean context, just a
84		``if not aligment`` will also work, if alignment is an alignment object.
85		But this of course, does not work, if it is a string representation.
86		Therefore it is better to use the
87		:func:`gfapy.is_placeholder() <gfapy.placeholder.is_placeholder>` method,
88		which works in both cases.
89
90		.. doctest::
91
92		>>> if not Alignment("*"): print('no alignment')
93		no alignment
94		>>> if is_placeholder(Alignment("*")): print('no alignment')
95		no alignment
96		>>> if "*": print('not a placeholder...?')
97		not a placeholder...?
98		>>> if is_placeholder("*"): print('really? it is a placeholder!')
99		really? it is a placeholder!
100
101		Reading and editing CIGARs
102		^^^^^^^^^^^^^^^^^^^^^^^^^^
103
104		CIGARs are represented by specialized lists, instances of the class
105		:class:`~gfapy.alignment.cigar.CIGAR`, whose elements are CIGAR operations
106		CIGAR operations are represented by instance of the class
107		:class:`~gfapy.alignment.cigar.CIGAR.Operation`,
108		and provide the properties ``length`` (length of the operation, an integer)
109		and ``code`` (one-letter string which specifies the type of operation).
110		Note that not all operations allowed in SAM files (for which CIGAR strings
111		were first defined) are also meaningful in GFA and thus GFA2 only allows
112		the operations ``M``, ``I``, ``D`` and ``P``.
113
114		.. doctest::
115
116		>>> cigar = gfapy.Alignment("30M")
117		>>> isinstance(cigar, list)
118		True
119		>>> operation = cigar[0]
120		>>> type(operation)
121		<class 'gfapy.alignment.cigar.CIGAR.Operation'>
122		>>> operation.code
123		'M'
124		>>> operation.code = 'D'
125		>>> operation.length
126		30
127		>>> len(operation)
128		30
129		>>> str(operation)
130		'30D'
131
132		As a CIGAR instance is a list, list methods apply to it. If the array is
133		emptied, its string representation will be the placeholder symbol ``*``.
134
135		.. doctest::
136
137		>>> cigar = gfapy.Alignment("1I20M2D")
138		>>> cigar[0].code = "M"
139		>>> cigar.pop(1)
140		gfapy.CIGAR.Operation(20,'M')
141		>>> str(cigar)
142		'1M2D'
143		>>> cigar[:] = []
144		>>> str(cigar)
145		'*'
146
147		The validate :func:`CIGAR.validate() <gfapy.alignment.cigar.CIGAR.validate>`
148		function checks if a CIGAR instance is valid. A version can be provided, as the
149		CIGAR validation is version specific (as GFA2 forbids some CIGAR operations).
150
151		.. doctest::
152
153		>>> cigar = gfapy.Alignment("30M10D20M5I10M")
154		>>> cigar.validate()
155		>>> cigar[1].code = "L"
156		>>> cigar.validate()
157		Traceback (most recent call last):
158		...
159		gfapy.error.ValueError:
160		>>> cigar = gfapy.Alignment("30M10D20M5I10M")
161		>>> cigar[1].code = "X"
162		>>> cigar.validate(version="gfa1")
163		>>> cigar.validate(version="gfa2")
164		Traceback (most recent call last):
165		...
166		gfapy.error.ValueError:
167
168		Reading and editing traces
169		^^^^^^^^^^^^^^^^^^^^^^^^^^
170
171		Traces are arrays of non-negative integers. The values are interpreted
172		using a trace spacing value. If traces are used, a trace spacing value
173		must be defined in a TS integer tag, either in the header, or in the
174		single lines which contain traces (which takes precedence over the
175		header global value).
176
177		.. doctest::
178
179		>>> print(gfa) #doctest: +SKIP
180		H TS:i:100
181		E x A+ B- 0 100$ 0 100$ 4,2 TS:i:50
182		...
183		>>> gfa.header.TS
184		100
185		>>> gfa.line("x").TS
186		50
187
188		Query, reference and complement
189		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
190
191		CIGARs are asymmetric, i.e.\ they consider one sequence as reference and
192		another sequence as query.
193
194		The :func:`~gfapy.alignment.cigar.CIGAR.length_on_reference` and
195		:func:`~gfapy.alignment.cigar.CIGAR.length_on_query` methods compute the length
196		of the alignment on the two sequences. These methods are used by the library
197		e.g. to convert GFA1 L lines to GFA2 E lines (which is only possible if CIGARs
198		are provided).
199
200		.. doctest::
201
202		>>> cigar = gfapy.Alignment("30M10D20M5I10M")
203		>>> cigar.length_on_reference()
204		70
205		>>> cigar.length_on_query()
206		65
207
208		CIGARs are dependent on which sequence is taken as reference and which
209		is taken as query. For each alignment, a complement CIGAR can be
210		computed using the method
211		:func:`~gfapy.alignment.cigar.CIGAR.complement`; it is the CIGAR obtained
212		when the two sequences are switched.
213
214		.. doctest::
215
216		>>> cigar = gfapy.Alignment("2M1D3M")
217		>>> str(cigar.complement())
218		'3M1I2M'
219
220		The current version of Gfapy does not provide a way to compute the
221		alignment, thus the trace information can be accessed and edited, but
222		not used for this purpose. Because of this there is currently no way in
223		Gfapy to compute a complement trace (trace obtained when the sequences
224		are switched).
225
226		.. doctest::
227
228		>>> trace = gfapy.Alignment("1,2,3")
229		>>> str(trace.complement())
230		'*'
231
232		The complement of a placeholder is a placeholder:
233
234		.. doctest::
235
236		>>> str(gfapy.Alignment("*").complement())
237		'*'

+0

-71

~~doc/tutorial/comments.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		g = gfapy.Gfa()
4
5		.. _comments:
6
7		Comments
8		--------
9
10		GFA lines starting with a ``#`` symbol are considered comments. In Gfapy
11		comments are represented by instances of the class :class:`gfapy.line.Comment
12		<gfapy.line.comment.comment.Comment>`. They have a similar interface to other
13		line instances, with some differences, e.g. they do not support tags.
14
15		The comments collection
16		~~~~~~~~~~~~~~~~~~~~~~~
17
18		The comments of a Gfa object are accessed using the :func:`Gfa.comments
19		<gfapy.lines.collections.Collections.comments>` property. This is a list of
20		comment line instances. The single elements can be modified, but the list
21		itself is read-only. To remove a comment from the Gfa, you need to find the
22		instance in the list, and call
23		:func:`~gfapy.line.common.disconnection.Disconnection.disconnect` on it. To
24		add a comment to a :class:`~gfapy.gfa.Gfa` instance is done similarly to other
25		lines, by using the :func:`Gfa.add_line(line)
26		<gfapy.lines.creators.Creators.add_line>` method.
27
28		.. doctest::
29
30		>>> g.add_line("# this is a comment") #doctest: +ELLIPSIS
31		>>> [str(c) for c in g.comments]
32		['# this is a comment']
33		>>> g.comments[0].disconnect()
34		>>> g.comments
35		[]
36
37		Accessing the comment content
38		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
39
40		The content of the comment line, excluding the initial ``#`` and eventual
41		initial spacing characters, is included in the ``content`` field. The initial
42		spacing characters can be read/changed using the ``spacer`` field. The default
43		value is a single space.
44
45		.. doctest::
46
47		>>> g.add_line("# this is a comment") #doctest: +ELLIPSIS
48		>>> c = g.comments[-1]
49		>>> c.content
50		'this is a comment'
51		>>> c.spacer
52		' '
53		>>> c.spacer = '___'
54		>>> str(c)
55		'#___this is a comment'
56
57		Tags are not supported by comment lines. If the line contains tags,
58		these are nor parsed, but included in the ``content`` field. Trying to set
59		tags raises exceptions.
60
61		.. doctest::
62
63		>>> c = gfapy.Line("# this is not a tag\txx:i:1")
64		>>> c.content
65		'this is not a tag\txx:i:1'
66		>>> c.xx
67		>>> c.xx = 1
68		Traceback (most recent call last):
69		...
70		gfapy.error.RuntimeError: Tags of comment lines cannot be set

+0

-296

~~doc/tutorial/custom_records.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		g = gfapy.Gfa(version = 'gfa2')
4
5		.. _custom_records:
6
7		Custom records
8		--------------
9
10		The GFA2 specification considers each line which starts with a non-standard
11		record type a custom (i.e. user- or program-specific) record.
12		Gfapy allows to retrieve these records and access their data using a
13		similar interface to that for the predefined record types.
14
15		Retrieving, adding and deleting custom records
16		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17
18		Gfa instances have the property
19		:func:`~gfapy.lines.collections.Collections.custom_records`,
20		a list of all line instances with a non-standard record type. Among these,
21		records of a specific record type are retrieved using the method
22		:func:`Gfa.custom_records_of_type(record_type)
23		<gfapy.lines.collections.Collections.custom_records_of_type>`.
24		Lines are added and deleted using the same methods
25		(:func:`~gfapy.lines.creators.Creators.add_line` and
26		:func:`~gfapy.line.common.disconnection.Disconnection.disconnect`) as for
27		other line types.
28
29		.. doctest::
30
31		>>> g.add_line("X\tcustom line") #doctest: +ELLIPSIS
32		>>> g.add_line("Y\tcustom line") #doctest: +ELLIPSIS
33		>>> [str(line) for line in g.custom_records] #doctest: +SKIP
34		['X\tcustom line', 'Y\tcustom line']
35		>>> g.custom_record_keys) #doctest: +SKIP
36		['X', 'Y']
37		>>> [str(line) for line in g.custom_records_of_type('X')]
38		['X\tcustom line']
39		>>> g.custom_records_of_type("X")[-1].disconnect()
40		>>> g.custom_records_of_type('X')
41		[]
42
43		Interface without extensions
44		~~~~~~~~~~~~~~~~~~~~~~~~~~~~
45
46		If no extension (see :ref:`extensions` section) has been defined to handle a
47		custom record type, the interface has some limitations: the field content is
48		not validated, and the field names are unknown. The generic custom record
49		class is employed
50		(:class:`~gfapy.line.custom_record.custom_record.CustomRecord`).
51
52		As the name of the positional fields in a custom record is not known, a generic
53		name ``field1``, ``field2``, ... is used. The number of positional fields is
54		found by getting the length of the
55		:attr:`~gfapy.line.custom_record.init.Init.positional_fieldnames` list.
56
57		.. doctest::
58
59		>>> g.add_line("X\ta\tb\tcc:i:10\tdd:i:100") #doctest: +ELLIPSIS
60		>>> x = g.custom_records_of_type('X')[-1]
61		>>> len(x.positional_fieldnames)
62		2
63		>>> x.field1
64		'a'
65		>>> x.field2
66		'b'
67
68		Positional fields are allowed to contain any character (including non-printable
69		characters and spacing characters), except tabs and newlines (as they are
70		structural elements of the line). No further validation is performed.
71
72		As Gfapy cannot know how many positional fields are present when parsing custom
73		records, a heuristic approach is followed, to identify tags. A field resembles
74		a tag if it starts with ``tn:d:`` where ``tn`` is a valid tag name and ``d`` a
75		valid tag datatype (see :ref:`tags` chapter). The fields are parsed from the
76		last to the first.
77
78		As soon as a field is found which does not resemble a tag, all remaining fields
79		are considered positionals (even if another field parsed later resembles a
80		tag). Due to this, invalid tags are sometimes wrongly taken as positional
81		fields (this can be avoided by writing an extension).
82
83		.. doctest::
84
85		>>> g.add_line("X\ta\tb\tcc:i:10\tdd:i:100") #doctest: +ELLIPSIS
86		>>> x1 = g.custom_records_of_type("X")[-1]
87		>>> x1.cc
88		10
89		>>> x1.dd
90		100
91		>>> g.add_line("X\ta\tb\tcc:i:10\tdd:i:100\te") #doctest: +ELLIPSIS
92		>>> x2 = g.custom_records_of_type("X")[-1]
93		>>> x2.cc
94		>>> x2.field3
95		'cc:i:10'
96		>>> g.add_line("Z\ta\tb\tcc:i:10\tddd:i:100") #doctest: +ELLIPSIS
97		>>> x3 = g.custom_records_of_type("Z")[-1]
98		>>> x3.cc
99		>>> x3.field3
100		'cc:i:10'
101		>>> x3.field4
102		'ddd:i:100'
103
104		.. _extensions:
105
106		Extensions
107		~~~~~~~~~~
108
109		The support for custom fields is limited, as Gfapy does not know which and how
110		many fields are there and how shall they be validated. It is possible to create
111		an extension of Gfapy, which defines new record types: this will allow to use
112		these record types in a similar way to the built-in types.
113
114		As an example, an extension will be described, which defines two record types:
115		T for taxa and M for assignments of segments to taxa. For further information
116		about the possible usage case for this extension, see the Supplemental
117		Information to the manuscript describing Gfapy.
118
119		The T records will contain a single positional field, ``tid``, a GFA2
120		identifier, and an optional UL string tag. The M records will contain three
121		positional fields (all three GFA2 identifier): a name field ``mid`` (optional),
122		and two references, ``tid`` to a T line and ``sid`` to an S line. The SC
123		integer tag will be also defined. Here is an example of a GFA containing M and
124		T lines:
125
126		.. code::
127
128		S sA 1000 *
129		S sB 1000 *
130		M assignment1 t123 sA SC:i:40
131		M assignment2 t123 sB
132		M * B12c sB SC:i:20
133		T B12c
134		T t123 UL:Z:http://www.taxon123.com
135
136		Writing subclasses of the :class:`~gfapy.line.line.Line` class, it is possible to
137		communicate to Gfapy, how records of the M and T class shall be handled. This
138		only requires to define some constants and to call the class method
139		:func:`~gfapy.line.line.Line.register_extension`.
140
141		The constants to define are ``RECORD TYPE``, which shall be the content
142		of the record type field (e.g. ``M``); ``POSFIELDS`` shall contain an ordered
143		dict, specifying the datatype for each positional field, in the order these
144		fields are found in the line; ``TAGS_DATATYPE`` is a dict, specifying the
145		datatype of the predefined optional tags; ``NAME_FIELD`` is a field name,
146		and specifies which field contains the identifier of the line.
147		For details on predefined and custom datatypes, see the next sections
148		(:ref:`predefined_datatypes` and :ref:`custom_datatypes`).
149
150		To handle references, :func:`~gfapy.line.line.Line.register_extension`
151		can be supplied with a ``references`` parameter, a list of triples
152		``(fieldname, classname, backreferences)``. Thereby ``fieldname`` is the name
153		of the field in the corresponding record containing the reference (e.g.
154		``sid``), ``classname`` is the name of the class to which the reference goes
155		(e.g. ``gfa.line.segment.GFA2``), and \texttt{backreferences} is how the
156		collection of backreferences shall be called, in the records to which reference
157		points to (e.g. ``metagenomic_assignments``).
158
159		.. code:: python
160
161		from collections include OrderedDict
162
163		class Taxon(gfapy.Line):
164		RECORD_TYPE = "T"
165		POSFIELDS = OrderedDict([("tid","identifier_gfa2")])
166		TAGS_DATATYPE = {"UL":"Z"}
167		NAME_FIELD = "tid"
168
169		Taxon.register_extension()
170
171		class MetagenomicAssignment(gfapy.Line):
172		RECORD_TYPE = "M"
173		POSFIELDS = OrderedDict([("mid","optional_identifier_gfa2"),
174		("tid","identifier_gfa2"),
175		("sid","identifier_gfa2")])
176		TAGS_DATATYPE = {"SC":"i"}
177		NAME_FIELD = "mid"
178
179		MetagenomicAssignment.register_extension(references=
180		[("sid", gfapy.line.segment.GFA2, "metagenomic_assignments"),
181		("tid", Taxon, "metagenomic_assignments")])
182
183		.. _predefined_datatypes:
184
185		Predefined datatypes for extensions
186		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
187
188		The datatype of fields is specified in Gfapy using classes, which provide
189		functions for decoding, encoding and validating the corresponding data.
190		Gfapy contains a number of datatypes which correspond to the description
191		of the field content in the GFA1 and GFA2 specification.
192
193		When writing extensions only the GFA2 field datatypes are generally used
194		(as GFA1 does not contain custom fields). They are summarized in
195		the following table:
196
197		+-------------------------------------+---------------+--------------------------------------------------------+
198		\| Name \| Example \| Description \|
199		+=====================================+===============+========================================================+
200		\| ``alignment_gfa2`` \| ``12M1I3M`` \| CIGAR string, Trace alignment or Placeholder (``*``) \|
201		+-------------------------------------+---------------+--------------------------------------------------------+
202		\| ``identifier_gfa2`` \| ``S1`` \| ID of a line \|
203		+-------------------------------------+---------------+--------------------------------------------------------+
204		\| ``oriented_identifier_gfa2`` \| ``S1+`` \| ID of a line followed by ``+`` or ``-`` \|
205		+-------------------------------------+---------------+--------------------------------------------------------+
206		\| ``optional_identifier_gfa2`` \| ```` \| ID of a line or Placeholder (````) \|
207		+-------------------------------------+---------------+--------------------------------------------------------+
208		\| ``identifier_list_gfa2`` \| ``S1 S2`` \| space separated list of line IDs \|
209		+-------------------------------------+---------------+--------------------------------------------------------+
210		\| ``oriented_identifier_list_gfa2`` \| ``S1+ S2-`` \| space separated list of line IDs plus orientations \|
211		+-------------------------------------+---------------+--------------------------------------------------------+
212		\| ``position_gfa2`` \| ``120$`` \| non-negative integer, optionally followed by ``$`` \|
213		+-------------------------------------+---------------+--------------------------------------------------------+
214		\| ``sequence_gfa2`` \| ``ACGNNYR`` \| sequence of printable chars., no whitespace \|
215		+-------------------------------------+---------------+--------------------------------------------------------+
216		\| ``string`` \| ``a b_c;d`` \| string, no tabs and newlines (Z tags) \|
217		+-------------------------------------+---------------+--------------------------------------------------------+
218		\| ``char`` \| ``A`` \| single character (A tags) \|
219		+-------------------------------------+---------------+--------------------------------------------------------+
220		\| ``float`` \| ``1.12`` \| float (f tags) \|
221		+-------------------------------------+---------------+--------------------------------------------------------+
222		\| ``integer`` \| ``-12`` \| integer (i tags) \|
223		+-------------------------------------+---------------+--------------------------------------------------------+
224		\| ``optional_integer`` \| ``*`` \| integer or placeholder \|
225		+-------------------------------------+---------------+--------------------------------------------------------+
226		\| ``numeric_array`` \| ``c,10,3`` \| array of integers or floats (B tags) \|
227		+-------------------------------------+---------------+--------------------------------------------------------+
228		\| ``byte_array`` \| ``12F1FF`` \| hexadecimal byte string (H tags) \|
229		+-------------------------------------+---------------+--------------------------------------------------------+
230		\| ``json`` \| ``{’b’:2}`` \| JSON string, no tabs and newlines (J tags) \|
231		+-------------------------------------+---------------+--------------------------------------------------------+
232
233		.. _custom_datatypes:
234
235		Custom datatypes for extensions
236		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
237
238		For custom records, one sometimes needs datatypes not yet available in the GFA
239		specification. For example, a custom datatype can be defined for
240		the taxon identifier used in the ``tid`` field of the T and M records:
241		accordingly the taxon identifier shall be only either
242		in the form ``taxon:<n>``, where ``<n>`` is a positive integer,
243		or consist of letters, numbers and underscores only
244		(without ``:``).
245
246		To define the datatype, a class is written, which contains the following
247		functions:
248
249		* ``validate_encoded(string)``: validates the content of the field,
250		if this is a string (e.g., the name of the T line)
251		* ``validate_decoded(object)``: validates the content of the field,
252		if this is not a string (e.g., a reference to a T line)
253		* ``decode(string)``: validates the content of the field (a string)
254		and returns the decoded content; note that references must not be resolved
255		(there is no access to the Gfa instance here), thus the name of the
256		T line will be returned unchanged
257		* ``encode(string)``: validates the content of the field (not in string
258		form) and returns the string which codes it in the GFA file (also here
259		references are validated but not converted into strings)
260
261		Finally the datatype is registered calling
262		:func:`~gfapy.field.field.Field.register_datatype`. The code for
263		the taxon ID extension is the following:
264
265		.. code:: python
266
267		import re
268
269		class TaxonID:
270
271		def validate_encoded(string):
272		if not re.match(r"^taxon:(\d+)$",string) and \
273		not re.match(r"^[a-zA-Z0-9_]+$", string):
274		raise gfapy.ValueError("Invalid taxon ID: {}".format(string))
275
276		def decode(string):
277		TaxonID.validate_encoded(string)
278		return string
279
280		def validate_decoded(obj):
281		if isinstance(obj,Taxon):
282		TaxonID.validate_encoded(obj.name)
283		else:
284		raise gfapy.TypeError(
285		"Invalid type for taxon ID: "+"{}".format(repr(obj)))
286
287		def encode(obj):
288		TaxonID.validate_decoded(obj)
289		return obj
290
291		gfapy.Field.register_datatype("taxon_id", TaxonID)
292
293		To use the new datatype in the T and M lines defined above (:ref:`extensions`),
294		the definition of the two subclasses can be changed:
295		in ``POSFIELDS`` the value ``taxon_id`` shall be assigned to the key ``tid``.

+0

-43

~~doc/tutorial/errors.rst~~ less more

0		.. _errors:
1
2		Errors
3		------
4
5		The different types of errors defined in Gfapy are summarized in the
6		following table. All exception raised in the library are subclasses of
7		`Error`. Thus, ``except gfapy.Error`` can be use to catch
8		all library errors.
9
10		+-----------------------+-------------------------------+---------------------------------+
11		\| Error \| Description \| Examples \|
12		+=======================+===============================+=================================+
13		\| `VersionError` \| An unknown or wrong version \| "GFA0"; or GFA1 in GFA2 context \|
14		\| \| is specified or implied \| \|
15		+-----------------------+-------------------------------+---------------------------------+
16		\| `ValueError` \| The value of an object is \| a negative position is used \|
17		\| \| invalid \| \|
18		+-----------------------+-------------------------------+---------------------------------+
19		\| `TypeError` \| The wrong type has been used \| Z instead of i used for VN tag; \|
20		\| \| or specified \| Hash for an i tag \|
21		+-----------------------+-------------------------------+---------------------------------+
22		\| `FormatError` \| The format of an object is \| a line does not contain the \|
23		\| \| wrong \| expected number of fields \|
24		+-----------------------+-------------------------------+---------------------------------+
25		\| `NotUniqueError` \| Something should be unique \| duplicated tag name or line \|
26		\| \| but is not \| identifier \|
27		+-----------------------+-------------------------------+---------------------------------+
28		\| `InconsistencyError` \| Pieces of information collide \| length of sequence and LN tag \|
29		\| \| with each other \| do not match \|
30		+-----------------------+-------------------------------+---------------------------------+
31		\| `RuntimeError` \| The user tried to do \| editing from/to field in \|
32		\| \| something which is not \| connected links \|
33		\| \| allowed \| \|
34		+-----------------------+-------------------------------+---------------------------------+
35		\| `ArgumentError` \| Problem with the arguments of \| wrong number of arguments in \|
36		\| \| a method \| dynamically created method \|
37		+-----------------------+-------------------------------+---------------------------------+
38		\| `AssertionError` \| Something unexpected happened \| there is a bug in the library or\|
39		\| \| \| the library has been used in \|
40		\| \| \| an unintended way \|
41		+-----------------------+-------------------------------+---------------------------------+
42

+0

-409

~~doc/tutorial/gfa.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		gfa = gfapy.Gfa()
4		gfa1 = gfapy.Gfa()
5		gfa1.add_line("H\tVN:Z:1.0")
6		gfa1.add_line("# this is a comment")
7		gfa1.add_line("S\t1\t*")
8		gfa1.add_line("S\t2\t*")
9		gfa1.add_line("S\t3\t*")
10		gfa2 = gfapy.Gfa()
11		gfa2.add_line("H\tVN:Z:2.0\tTS:i:100")
12		gfa2.add_line("X\tcustom line")
13		gfa2.add_line("Y\tcustom line")
14
15		.. _gfa:
16
17		The Gfa class
18		-------------
19
20		The content of a GFA file is represented in Gfapy by an instance of the class
21		:class:`~gfapy.gfa.Gfa`. In most cases, the Gfa instance will be constructed
22		from the data contained in a GFA file, using the method
23		:func:`Gfa.from_file() <gfapy.gfa.Gfa.from_file>`.
24
25		Alternatively, it is possible to use the construct of the class; it takes an
26		optional positional parameter, the content of a GFA file (as string, or as list
27		of strings, one per line of the GFA file). If no GFA content is provided, the
28		Gfa instance will be empty.
29
30		.. doctest::
31
32		>>> gfa = gfapy.Gfa("H\tVN:Z:1.0\nS\tA\t*")
33		>>> print(len(gfa.lines))
34		2
35		>>> gfa = gfapy.Gfa(["H\tVN:Z:1.0", "S\tA\t", "S\tB\t"])
36		>>> print(len(gfa.lines))
37		3
38		>>> gfa = gfapy.Gfa()
39		>>> print(len(gfa.lines))
40		0
41
42		The string representation of the Gfa object (which can be obtained using
43		``str()``) is the textual representation in GFA format.
44		Using :func:`Gfa.to_file(filename) <gfapy.gfa.Gfa.to_file>` allows
45		writing this representation to a GFA file (the content of the file is
46		overwritten).
47
48		.. doctest::
49
50		>>> g1 = gfapy.Gfa()
51		>>> g1.append("H\tVN:Z:1.0")
52		>>> g1.append("S\ta\t*")
53		>>> g1.to_file("my.gfa") #doctest: +SKIP
54		>>> g2 = gfapy.Gfa.from_file("my.gfa") #doctest: +SKIP
55		>>> str(g1)
56		'H\tVN:Z:1.0\nS\ta\t*'
57
58
59		All methods for creating a Gfa (constructor and from_file) accept
60		a ``vlevel`` parameter, the validation level,
61		and can assume the values 0, 1, 2 and 3. A higher value means
62		more validations are performed. The :ref:`validation` chapter explains
63		the meaning of the different validation levels in detail.
64		The default value is 1.
65
66		.. doctest::
67
68		>>> gfapy.Gfa().vlevel
69		1
70		>>> gfapy.Gfa(vlevel = 0).vlevel
71		0
72
73		A further parameter is ``version``. It can be set to ``'gfa1'``,
74		``'gfa2'`` or left to the default value (``None``). The default
75		is to auto-detect the version of the GFA from the line content.
76		If the version is set manually, any content not compatible to the
77		specified version will trigger an exception. If the version is
78		set automatically, an exception will be raised if two lines
79		are found, with content incompatible to each other (e.g. a GFA1
80		segment followed by a GFA2 segment).
81
82		.. doctest::
83
84		>>> g = gfapy.Gfa(version='gfa2')
85		>>> g.version
86		'gfa2'
87		>>> g.add_line("S\t1\t*")
88		Traceback (most recent call last):
89		...
90		gfapy.error.VersionError: Version: 1.0 (None)
91		...
92		>>> g = gfapy.Gfa()
93		>>> g.version
94		>>> g.add_line("S\t1\t*")
95		>>> g.version
96		'gfa1'
97		>>> g.add_line("S\t1\t100\t*")
98		Traceback (most recent call last):
99		...
100		gfapy.error.VersionError: Version: 1.0 (None)
101		...
102
103		Collections of lines
104		~~~~~~~~~~~~~~~~~~~~
105
106		The property :attr:`~gfapy.lines.collections.Collections.lines`
107		of the Gfa object is a list of all the lines
108		in the GFA file (including the header, which is split into single-tag
109		lines). The list itself shall not be modified by the user directly (i.e.
110		adding and removing lines is done using a different interface, see
111		below). However the single elements of the list can be edited.
112
113		.. doctest::
114
115		>>> for line in gfa.lines: print(line)
116
117		For most record types, a list of the lines of the record type is available
118		as a read-only property, which is named after the record type, in plural.
119
120		.. doctest::
121
122		>>> [str(line) for line in gfa1.segments]
123		['S\t1\t', 'S\t2\t', 'S\t3\t*']
124		>>> [str(line) for line in gfa2.fragments]
125		[]
126
127		A particular case are edges; these are in GFA1 links and containments, while in
128		GFA2 there is a unified edge record type, which also allows to represent
129		internal alignments. In Gfapy, the
130		:attr:`~gfapy.lines.collections.Collections.edges` property retrieves all edges
131		(i.e. all E lines in GFA2, and all L and C lines in GFA1). The
132		:attr:`~gfapy.lines.collections.Collections.dovetails` property is a list of
133		all edges which represent dovetail overlaps (i.e. all L lines in GFA1 and a
134		subset of the E lines in GFA2). The
135		:attr:`~gfapy.lines.collections.Collections.containments` property is a list of
136		all edges which represent containments (i.e. all C lines in GFA1 and a subset
137		of the E lines in GFA2).
138
139		.. doctest::
140
141		>>> gfa2.edges
142		[]
143		>>> gfa2.dovetails
144		[]
145		>>> gfa2.containments
146		[]
147
148		Paths are retrieved using the
149		:attr:`~gfapy.lines.collections.Collections.paths` property. This list
150		contains all P lines in GFA1 and all O lines in GFA2. Sets returns the list of
151		all U lines in GFA2 (empty list in GFA1).
152
153		.. doctest::
154
155		>>> gfa2.paths
156		[]
157		>>> gfa2.sets
158		[]
159
160		The header contain metadata in a single or multiple lines. For ease of
161		access to the header information, all its tags are summarized in a
162		single line instance, which is retrieved using the
163		:attr:`~gfapy.lines.headers.Headers.header` property. This list
164		The :ref:`header` chapter of this manual explains more in
165		detail, how to work with the header object.
166
167		.. doctest::
168
169		>>> gfa2.header.TS
170		100
171
172		All lines which start by the string ``#`` are comments; they are handled in
173		the :ref:`comments` chapter and are retrieved using the
174		:attr:`~gfapy.lines.collections.Collections.comments` property.
175
176		.. doctest::
177
178		>>> [str(line) for line in gfa1.comments]
179		['# this is a comment']
180
181		Custom lines are lines of GFA2 files which start
182		with a non-standard record type. Gfapy provides basic built-in support
183		for accessing the information in custom lines, and allows to define
184		extensions for own record types for defining more advanced
185		functionality (see the :ref:`custom_records` chapter).
186
187		.. doctest::
188
189		>>> [str(line) for line in gfa2.custom_records]
190		['X\tcustom line', 'Y\tcustom line']
191		>>> gfa2.custom_record_keys
192		['X', 'Y']
193		>>> [str(line) for line in gfa2.custom_records_of_type('X')]
194		['X\tcustom line']
195
196		Line identifiers
197		~~~~~~~~~~~~~~~~
198
199		Some GFA lines have a mandatory or optional identifier field: segments and
200		paths in GFA1, segments, gaps, edges, paths and sets in GFA2. A line of this
201		type can be retrieved by identifier, using the method
202		:func:`Gfa.line(ID) <gfapy.gfa.Gfa.line>` using the identifier as argument.
203
204		.. doctest::
205
206		>>> str(gfa1.line('1'))
207		'S\t1\t*'
208
209		The GFA2 specification prescribes the exact namespace for the identifier
210		(segments, paths, sets, edges and gaps identifier share the same namespace).
211		The content of this namespace can be retrieved using the
212		:attr:`~gfapy.lines.collections.Collections.names` property.
213		The identifiers of single line types
214		can be retrieved using the properties
215		:attr:`~gfapy.lines.collections.Collections.segment_names`,
216		:attr:`~gfapy.lines.collections.Collections.edge_names`,
217		:attr:`~gfapy.lines.collections.Collections.gap_names`,
218		:attr:`~gfapy.lines.collections.Collections.path_names` and
219		:attr:`~gfapy.lines.collections.Collections.set_names`.
220
221		.. doctest::
222
223		>>> g = gfapy.Gfa()
224		>>> g.add_line("S\tA\t100\t*")
225		>>> g.add_line("S\tB\t100\t*")
226		>>> g.add_line("S\tC\t100\t*")
227		>>> g.add_line("E\tb_c\tB+\tC+\t0\t10\t90\t100$\t*")
228		>>> g.add_line("O\tp1\tB+ C+")
229		>>> g.add_line("U\ts1\tA b_c g")
230		>>> g.add_line("G\tg\tA+\tB-\t1000\t*")
231		>>> g.names
232		['A', 'B', 'C', 'b_c', 'g', 'p1', 's1']
233		>>> g.segment_names
234		['A', 'B', 'C']
235		>>> g.path_names
236		['p1']
237		>>> g.edge_names
238		['b_c']
239		>>> g.gap_names
240		['g']
241		>>> g.set_names
242		['s1']
243
244		The GFA1 specification does not handle the question of the namespace of
245		identifiers explicitly. However, gfapy assumes and enforces
246		a single namespace for segment, path names and the values of the ID tags
247		of L and C lines. The content of this namespace can be found using
248		:attr:`~gfapy.lines.collections.Collections.names` property.
249		The identifiers of single line types
250		can be retrieved using the properties
251		:attr:`~gfapy.lines.collections.Collections.segment_names`,
252		:attr:`~gfapy.lines.collections.Collections.edge_names`
253		(ID tags of links and containments) and
254		:attr:`~gfapy.lines.collections.Collections.path_names`.
255		For GFA1, the properties
256		:attr:`~gfapy.lines.collections.Collections.gap_names`,
257		:attr:`~gfapy.lines.collections.Collections.set_names`
258		contain always empty lists.
259
260		.. doctest::
261
262		>>> g = gfapy.Gfa()
263		>>> g.add_line("S\tA\t*")
264		>>> g.add_line("S\tB\t*")
265		>>> g.add_line("S\tC\t*")
266		>>> g.add_line("L\tB\t+\tC\t+\t*\tID:Z:b_c")
267		>>> g.add_line("P\tp1\tB+,C+\t*")
268		>>> g.names
269		['A', 'B', 'C', 'b_c', 'p1']
270		>>> g.segment_names
271		['A', 'B', 'C']
272		>>> g.path_names
273		['p1']
274		>>> g.edge_names
275		['b_c']
276		>>> g.gap_names
277		[]
278		>>> g.set_names
279		[]
280
281		Identifiers of external sequences
282		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
283
284		Fragments contain identifiers which refer to external sequences
285		(not contained in the GFA file). According to the specification, the
286		these identifiers are not part of the same namespace as the identifier
287		of the GFA lines. They can be retrieved using the
288		:attr:`~gfapy.lines.collections.Collections.external_names`
289		property.
290
291		.. doctest::
292
293		>>> g = gfapy.Gfa()
294		>>> g.add_line("S\tA\t100\t*")
295		>>> g.add_line("F\tA\tread1+\t10\t30\t0\t20$\t20M")
296		>>> g.external_names
297		['read1']
298
299		The method
300		:func:`Gfa.fragments_for_external(external_ID) <gfapy.lines.finders.Finders.fragments_for_external>`
301		retrieves all F lines with a specified external sequence identifier.
302
303		.. doctest::
304
305		>>> f = g.fragments_for_external('read1')
306		>>> len(f)
307		1
308		>>> str(f[0])
309		'F\tA\tread1+\t10\t30\t0\t20$\t20M'
310
311		Adding new lines
312		~~~~~~~~~~~~~~~~
313
314		New lines can be added to a Gfa instance using the
315		:func:`Gfa.add_line(line) <gfapy.lines.creators.Creators.add_line>`
316		method or its alias
317		:func:`Gfa.append(line) <gfapy.lines.creators.Creators.append>`.
318		The argument can be either a string
319		describing a line with valid GFA syntax, or a :class:`~gfapy.line.line.Line`
320		instance. If a string is added, a line instance is created and
321		then added.
322
323		.. doctest::
324
325		>>> g = gfapy.Gfa()
326		>>> g.add_line("S\tA\t*") #doctest: +ELLIPSIS
327		>>> g.segment_names
328		['A']
329		>>> g.append("S\tB\t*") #doctest: +ELLIPSIS
330		>>> g.segment_names
331		['A', 'B']
332
333		Editing the lines
334		~~~~~~~~~~~~~~~~~
335
336		Accessing the information stored in the fields of a line instance is
337		described in the :ref:`positional_fields` and :ref:`tags` chapters.
338
339		In Gfapy, a line instance belonging to a Gfa instance is said
340		to be connected to the Gfa instance. Direct editing the content of a connected
341		line is only possible, for those fields which do not contain
342		references to other lines. For more information on how to modify the content of
343		the fields of connected line, see the :ref:`references` chapter.
344
345		.. doctest::
346
347		>>> g = gfapy.Gfa()
348		>>> e = gfapy.Line("E\t\tA+\tB-\t0\t10\t90\t100$\t")
349		>>> e.sid1 = "C+"
350		>>> g.add_line(e) #doctest: +ELLIPSIS
351		>>> e.sid1 = "A+"
352		Traceback (most recent call last):
353		gfapy.error.RuntimeError: ...
354
355		Removing lines
356		~~~~~~~~~~~~~~
357
358		Disconnecting a line from the Gfa instance is done using the
359		:func:`Gfa.rm(line) <gfapy.lines.destructors.Destructors.rm>` method. The
360		argument can be a line instance or the name of a line.
361
362		In alternative, a line instance can also be disconnected using the
363		`disconnect` method on it. Disconnecting a line
364		may trigger other operations, such as the disconnection of other lines (see the
365		:ref:`references` chapter).
366
367		.. doctest::
368
369		>>> g = gfapy.Gfa()
370		>>> g.add_line("S\tA\t*") #doctest: +ELLIPSIS
371		>>> g.segment_names
372		['A']
373		>>> g.rm('A') #doctest: +ELLIPSIS
374		>>> g.segment_names
375		[]
376		>>> g.append("S\tB\t*") #doctest: +ELLIPSIS
377		>>> g.segment_names
378		['B']
379		>>> b = g.line('B')
380		>>> b.disconnect()
381		>>> g.segment_names
382		[]
383
384		Renaming lines
385		~~~~~~~~~~~~~~
386
387		Lines with an identifier can be renamed. This is done simply by editing
388		the corresponding field (such as ``name`` or ``sid`` for a segment).
389		This field is not a reference to another line and can be freely edited
390		also in line instances connected to a Gfa. All references to the line
391		from other lines will still be up to date, as they will refer to the
392		same instance (whose name has been changed) and their string
393		representation will use the new name.
394
395		.. doctest::
396
397		>>> g = gfapy.Gfa()
398		>>> g.add_line("S\tA\t*") #doctest: +ELLIPSIS
399		>>> g.add_line("L\tA\t+\tB\t-\t*") #doctest: +ELLIPSIS
400		>>> g.segment_names
401		['A', 'B']
402		>>> g.dovetails[0].from_name
403		'A'
404		>>> g.segment('A').name = 'C'
405		>>> g.segment_names
406		['B', 'C']
407		>>> g.dovetails[0].from_name
408		'C'

+0

-14

~~doc/tutorial/graph_operations.rst~~ less more

0		.. _graph_operations:
1
2		Graph operations
3		----------------
4
5		Graph operations such as linear paths merging, multiplication of
6		segments and other are provided. These operations are implemented
7		in analogy to those provided by the Ruby library RGFA. As RGFA only
8		handles GFA1 graphs, only dovetail overlaps are considered as
9		connections. A detailed description of the operation can be
10		found in Gonnella and Kurtz (2016). More information about the
11		single operations are found in the method documentation of the
12		submodules of `GraphOperations`.
13

+0

-169

~~doc/tutorial/header.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		gfa = gfapy.Gfa()
4
5		.. _header:
6
7		The Header
8		----------
9
10		GFA files may contain one or multiple header lines (record type: "H"). These
11		lines may be present in any part of the file, not necessarily at the beginning.
12
13		Although the header may consist of multiple lines, its content refers to the
14		whole file. Therefore in Gfapy the header is accessed using a single line
15		instance (accessible by the :attr:`~gfapy.lines.headers.Headers.header`
16		property). Header lines contain only tags. If not header line is present in the
17		Gfa, then the header line object will be empty (i.e. contain no tags).
18
19		Note that header lines cannot be connected to the Gfa as other lines (i.e.
20		calling :meth:`~gfapy.line.common.connection.Connection.connect` on them raises
21		an exception). Instead they must be merged to the existing Gfa header, using
22		`add_line` on the Gfa instance.
23
24		.. doctest::
25
26		>>> gfa.add_line("H\tnn:f:1.0") #doctest: +ELLIPSIS
27		>>> gfa.header.nn
28		1.0
29		>>> gfapy.Line("H\tnn:f:1.0").connect(gfa)
30		Traceback (most recent call last):
31		...
32		gfapy.error.RuntimeError: ...
33
34		Multiple definitions of the predefined header tags
35		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
36
37		For the predefined tags (``VN`` and ``TS``), the presence of multiple
38		values in different lines is an error, unless the value is the same in
39		each instance (in which case the repeated definitions are ignored).
40
41		.. doctest::
42
43		>>> gfa.add_line("H\tVN:Z:1.0") #doctest: +ELLIPSIS
44		>>> gfa.add_line("H\tVN:Z:1.0") # ignored #doctest: +ELLIPSIS
45		>>> gfa.add_line("H\tVN:Z:2.0")
46		Traceback (most recent call last):
47		...
48		gfapy.error.VersionError: ...
49
50		Multiple definitions of custom header tags
51		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
52
53		If the tags are present only once in the header in its entirety, the access to
54		the tags is the same as for any other line (see the :ref:`tags` chapter).
55
56		However, the specification does not forbid custom tags to be defined with
57		different values in different header lines (which we name "multi-definition
58		tags"). This particular case is handled in the next sections.
59
60		Reading multi-definitions tags
61		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
62
63		Reading, validating and setting the datatype of multi-definition tags is done
64		using the same methods as for all other lines (see the :ref:`tags` chapter).
65		However, if a tag is defined multiple times on multiple H lines, reading the
66		tag will return a list of the values on the lines. This array is an instance of
67		the subclass ``gfapy.FieldArray`` of list.
68
69		.. doctest::
70
71		>>> gfa.add_line("H\txx:i:1") #doctest: +ELLIPSIS
72		>>> gfa.add_line("H\txx:i:2") #doctest: +ELLIPSIS
73		>>> gfa.add_line("H\txx:i:3") #doctest: +ELLIPSIS
74		>>> gfa.header.xx
75		gfapy.FieldArray('i',[1, 2, 3])
76
77		Setting tags
78		~~~~~~~~~~~~
79
80		There are two possibilities to set a tag for the header. The first is
81		the normal tag interface (using ``set`` or the tag name property). The
82		second is to use ``add``. The latter supports multi-definition tags,
83		i.e. it adds the value to the previous ones (if any), instead of
84		overwriting them.
85
86		.. doctest::
87
88		>>> gfa = gfapy.Gfa()
89		>>> gfa.header.xx
90		>>> gfa.header.add("xx", 1)
91		>>> gfa.header.xx
92		1
93		>>> gfa.header.add("xx", 2)
94		>>> gfa.header.xx
95		gfapy.FieldArray('i',[1, 2])
96		>>> gfa.header.set("xx", 3)
97		>>> gfa.header.xx
98		3
99
100		Modifying field array values
101		~~~~~~~~~~~~~~~~~~~~~~~~~~~~
102
103		Field arrays can be modified directly (e.g. adding new values or
104		removing some values). After modification, the user may check if the
105		array values remain compatible with the datatype of the tag using the
106		:meth:`~gfapy.line.common.validate.Validate.validate_field`` method.
107
108		.. doctest::
109
110		>>> gfa = gfapy.Gfa()
111		>>> gfa.header.xx = gfapy.FieldArray('i',[1,2,3])
112		>>> gfa.header.xx
113		gfapy.FieldArray('i',[1, 2, 3])
114		>>> gfa.header.validate_field("xx")
115		>>> gfa.header.xx.append("X")
116		>>> gfa.header.validate_field("xx")
117		Traceback (most recent call last):
118		...
119		gfapy.error.FormatError: ...
120
121		If the field array is modified using array methods which return a list
122		or data of any other type, a field array must be constructed, setting
123		its datatype to the value returned by calling
124		:meth:`~gfapy.line.common.field_datatype.FieldDatatype.get_datatype`
125		on the header.
126
127		.. doctest::
128
129		>>> gfa = gfapy.Gfa()
130		>>> gfa.header.xx = gfapy.FieldArray('i',[1,2,3])
131		>>> gfa.header.xx
132		gfapy.FieldArray('i',[1, 2, 3])
133		>>> gfa.header.xx = gfapy.FieldArray(gfa.header.get_datatype("xx"),
134		... list(map(lambda x: x+1, gfa.header.xx)))
135		>>> gfa.header.xx
136		gfapy.FieldArray('i',[2, 3, 4])
137
138		String representation of the header
139		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
140
141		For consistency with other line types, the string representation of the header
142		is a single-line string, eventually non standard-compliant, if it contains
143		multiple instances of the tag. (and when calling
144		:meth:`~gfapy.line.common.writer.Writer.field_to_s` for a tag present multiple
145		times, the output string will contain the instances of the tag, separated by
146		tabs).
147
148		However, when the Gfa is output to file or string, the header is split into
149		multiple H lines with single tags, so that standard-compliant GFA is output.
150		The split header can be retrieved using the
151		:attr:`~gfapy.lines.headers.Headers.headers` property of the Gfa instance.
152
153		.. doctest::
154
155		>>> gfa = gfapy.Gfa()
156		>>> gfa.header.VN = "1.0"
157		>>> gfa.header.xx = gfapy.FieldArray('i',[1,2])
158		>>> gfa.header.field_to_s("xx")
159		'1\t2'
160		>>> gfa.header.field_to_s("xx", tag=True)
161		'xx:i:1\txx:i:2'
162		>>> str(gfa.header)
163		'H\tVN:Z:1.0\txx:i:1\txx:i:2'
164		>>> [str(h) for h in gfa.headers]
165		['H\tVN:Z:1.0', 'H\txx:i:1', 'H\txx:i:2']
166		>>> str(gfa)
167		'H\tVN:Z:1.0\nH\txx:i:1\nH\txx:i:2'
168

+0

-69

~~doc/tutorial/placeholders.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3
4		.. _placeholders:
5
6		Placeholders
7		------------
8
9		Some positional fields may contain an undefined value S: ``sequence``;
10		L/C: ``overlap``; P: ``overlaps``; E: ``eid``, ``alignment``; F:
11		``alignment``; G: ``gid``, ``var``; U/O: ``pid``. In GFA this value is
12		represented by a ``*``.
13
14		In Gfapy the class `Placeholder` represent the undefined value.
15
16		Distinguishing placeholders
17		~~~~~~~~~~~~~~~~~~~~~~~~~~~
18
19		The :func:`gfapy.is_placeholder() <gfapy.placeholder.is_placeholder>` method
20		allows to check if a value is a placeholder; a value is a placeholder if
21		it is a `Placeholder` instance, or would represent
22		a placeholder in GFA (a string containing ``*``), or would be represented
23		by a placeholder in GFA (e.g. an empty array).
24
25		.. doctest::
26
27		>>> gfapy.is_placeholder("*")
28		True
29		>>> gfapy.is_placeholder("**")
30		False
31		>>> gfapy.is_placeholder([])
32		True
33		>>> gfapy.is_placeholder(gfapy.Placeholder())
34		True
35
36		Note that, as a placeholder is ``False`` in boolean context, just a
37		``if not placeholder`` will also work, if the value is an instance
38		of `Placeholder`, but not always for the other cases (in particular not
39		for the string representation ``*``).
40		Therefore using
41		:func:`gfapy.is_placeholder() <gfapy.placeholder.is_placeholder>`
42		is better.
43
44		.. doctest::
45
46		>>> if "": print(' is not a placeholder')
47		* is not a placeholder
48		>>> if gfapy.is_placeholder("*"): print('but it represents a placeholder')
49		but it represents a placeholder
50
51		Compatibility methods
52		~~~~~~~~~~~~~~~~~~~~~
53
54		Some methods are defined for placeholders, which allow them to respond
55		to the same methods as defined values. This allows to write generic
56		code.
57
58		.. doctest::
59
60		>>> placeholder = gfapy.Placeholder()
61		>>> placeholder.validate() # does nothing
62		>>> len(placeholder)
63		0
64		>>> placeholder[1]
65		gfapy.Placeholder()
66		>>> placeholder + 1
67		gfapy.Placeholder()
68

+0

-448

~~doc/tutorial/positional_fields.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		gfa = gfapy.Gfa()
4
5		.. _positional_fields:
6
7		Positional fields
8		-----------------
9
10		Most lines in GFA have positional fields (Headers are an exception).
11		During parsing, if a line is encountered, which has too less or too many
12		positional fields, an exception will be thrown. The correct number of
13		positional fields is record type-specific.
14
15		Positional fields are recognized by its position in the line. Each
16		positional field has an implicit field name and datatype associated with
17		it.
18
19		Field names
20		~~~~~~~~~~~
21
22		The field names are derived from the specification. Lower case versions
23		of the field names are used and spaces are substituted with underscores.
24		In some cases, the field names were changed, as they represent keywords
25		in common programming languages (``from``, ``send``).
26
27		The following tables shows the field names used in Gfapy, for each kind
28		of line. Headers have no positional fields. Comments and custom records
29		follow particular rules, see the respective chapters (:ref:`comments` and
30		:ref:`custom_records`).
31
32		GFA1 field names
33		^^^^^^^^^^^^^^^^
34
35		+---------------+--------------------+---------------------+------------------+-----------------+---------------+---------------+
36		\| Record Type \| Field 1 \| Field 2 \| Field 3 \| Field 4 \| Field 5 \| Field 6 \|
37		+===============+====================+=====================+==================+=================+===============+===============+
38		\| Segment \| ``name`` \| ``sequence`` \| \| \| \| \|
39		+---------------+--------------------+---------------------+------------------+-----------------+---------------+---------------+
40		\| Link \| ``from_segment`` \| ``from_orient`` \| ``to_segment`` \| ``to_orient`` \| ``overlap`` \| \|
41		+---------------+--------------------+---------------------+------------------+-----------------+---------------+---------------+
42		\| Containment \| ``from_segment`` \| ``from_orient`` \| ``to_segment`` \| ``to_orient`` \| ``pos`` \| ``overlap`` \|
43		+---------------+--------------------+---------------------+------------------+-----------------+---------------+---------------+
44		\| Path \| ``path_name`` \| ``segment_names`` \| ``overlaps`` \| \| \| \|
45		+---------------+--------------------+---------------------+------------------+-----------------+---------------+---------------+
46
47		GFA2 field names
48		^^^^^^^^^^^^^^^^
49
50		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
51		\| Record Type \| Field 1 \| Field 2 \| Field 3 \| Field 4 \| Field 5 \| Field 6 \| Field 7 \| Field 8 \|
52		+===============+===========+================+================+=============+=============+=============+=================+=================+
53		\| Segment \| ``sid`` \| ``slen`` \| ``sequence`` \| \| \| \| \| \|
54		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
55		\| Edge \| ``eid`` \| ``sid1`` \| ``sid2`` \| ``beg1`` \| ``end1`` \| ``beg2`` \| ``end2`` \| ``alignment`` \|
56		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
57		\| Fragment \| ``sid`` \| ``external`` \| ``s_beg`` \| ``s_end`` \| ``f_beg`` \| ``f_end`` \| ``alignment`` \| \|
58		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
59		\| Gap \| ``gid`` \| ``sid1`` \| ``d1`` \| ``d2`` \| ``sid2`` \| ``disp`` \| ``var`` \| \|
60		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
61		\| Set \| ``pid`` \| ``items`` \| \| \| \| \| \| \|
62		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
63		\| Path \| ``pid`` \| ``items`` \| \| \| \| \| \| \|
64		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
65
66		Datatypes
67		~~~~~~~~~
68
69		The datatype of each positional field is described in the specification
70		and cannot be changed (differently from tags). Here is a short
71		description of the Python classes used to represent data for different
72		datatypes.
73
74		Placeholders
75		^^^^^^^^^^^^
76
77		The positional fields in GFA can never be empty. However, there are some
78		fields with optional values. If a value is not specified, a placeholder
79		character is used instead (``*``). Such undefined values are represented
80		in Gfapy by the `Placeholder` class, which is described more in
81		detail in the :ref:`placeholders` chapter.
82
83		Arrays
84		^^^^^^
85
86		The ``items`` field in unordered and ordered groups and the
87		``segment_names`` and ``overlaps`` fields in paths are lists of objects
88		and are represented by list instances.
89
90		.. doctest::
91
92		>>> set = gfapy.Line("U\t*\t1 A 2")
93		>>> type(set.items)
94		<class 'list'>
95		>>> gfa2_path = gfapy.Line("O\t*\tA+ B-")
96		>>> type(gfa2_path.items)
97		<class 'list'>
98		>>> gfa1_path = gfapy.Line("P\tp1\tA+,B-\t10M,9M1D1M")
99		>>> type(gfa1_path.segment_names)
100		<class 'list'>
101		>>> type(gfa1_path.overlaps)
102		<class 'list'>
103
104		Orientations
105		^^^^^^^^^^^^
106
107		Orientations are represented by strings. The ``gfapy.invert()`` method
108		applied to an orientation string returns the other orientation.
109
110		.. doctest::
111
112		>>> gfapy.invert("+")
113		'-'
114		>>> gfapy.invert("-")
115		'+'
116
117		Identifiers
118		^^^^^^^^^^^
119
120		The identifier of the line itself (available for S, P, E, G, U, O lines)
121		can always be accessed in Gfapy using the ``name`` alias and is
122		represented in Gfapy by a string. If it is optional (E, G, U, O lines)
123		and not specified, it is represented by a Placeholder instance. The
124		fragment identifier is also a string.
125
126		Identifiers which refer to other lines are also present in some line
127		types (L, C, E, G, U, O, F). These are never placeholders and in
128		stand-alone lines are represented by strings. In connected lines they
129		are references to the Line instances to which they refer to (see the
130		:ref:`references` chapter).
131
132		Oriented identifiers
133		^^^^^^^^^^^^^^^^^^^^
134
135		Oriented identifiers (e.g. ``segment_names`` in GFA1 paths) are
136		represented by elements of the class ``gfapy.OrientedLine``. The
137		``segment`` method of the oriented segments returns the segment
138		identifier (or segment reference in connected path lines) and the
139		``orient`` method returns the orientation string. The ``name`` method
140		returns the string of the segment, even if this is a reference to a
141		segment. A new oriented line can be created using the
142		``OL[line, orientation]`` method.
143
144		Calling ``invert`` returns an oriented segment, with inverted
145		orientation. To set the two attributes the methods ``segment=`` and
146		``orient=`` are available.
147
148		Examples:
149
150		.. doctest::
151
152		>>> p = gfapy.Line("P\tP1\ta+,b-\t*")
153		>>> p.segment_names
154		[gfapy.OrientedLine('a','+'), gfapy.OrientedLine('b','-')]
155		>>> sn0 = p.segment_names[0]
156		>>> sn0.line
157		'a'
158		>>> sn0.name
159		'a'
160		>>> sn0.orient
161		'+'
162		>>> sn0.invert()
163		>>> sn0
164		gfapy.OrientedLine('a','-')
165		>>> sn0.orient
166		'-'
167		>>> sn0.line = gfapy.Line('S\tX\t*')
168		>>> str(sn0)
169		'X-'
170		>>> sn0.name
171		'X'
172		>>> sn0 = gfapy.OrientedLine(gfapy.Line('S\tY\t*'), '+')
173
174		Sequences
175		^^^^^^^^^
176
177		Sequences (S field sequence) are represented by strings in Gfapy.
178		Depending on the GFA version, the alphabet definition is more or less
179		restrictive. The definitions are correctly applied by the validation
180		methods.
181
182		The method ``rc()`` is provided to compute the reverse complement of a
183		nucleotidic sequence. The extended IUPAC alphabet is understood by the
184		method. Applied to non nucleotidic sequences, the results will be
185		meaningless:
186
187		.. doctest::
188
189		>>> from gfapy.sequence import rc
190		>>> rc("gcat")
191		'atgc'
192		>>> rc("*")
193		'*'
194		>>> rc("yatc")
195		'gatr'
196		>>> rc("gCat")
197		'atGc'
198		>>> rc("cag", rna=True)
199		'cug'
200
201		Integers and positions
202		^^^^^^^^^^^^^^^^^^^^^^
203
204		The C lines ``pos`` field and the G lines ``disp`` and ``var`` fields
205		are represented by integers. The ``var`` field is optional, and thus can
206		be also a placeholder. Positions are 0-based coordinates.
207
208		The position fields of GFA2 E lines (``beg1, beg2, end1, end2``) and F
209		lines (``s_beg, s_end, f_beg, f_end``) contain a dollar string as suffix
210		if the position is equal to the segment length. For more information,
211		see the :ref:`positions` chapter.
212
213		Alignments
214		^^^^^^^^^^
215
216		Alignments are always optional, ie they can be placeholders. If they are
217		specified they are CIGAR alignments or, only in GFA2, trace alignments.
218		For more details, see the :ref:`alignments` chapter.
219
220		GFA1 datatypes
221		^^^^^^^^^^^^^^
222
223		+------------------------+---------------+--------------------------------+
224		\| Datatype \| Record Type \| Fields \|
225		+========================+===============+================================+
226		\| Identifier \| Segment \| ``name`` \|
227		+------------------------+---------------+--------------------------------+
228		\| \| Path \| ``path_name`` \|
229		+------------------------+---------------+--------------------------------+
230		\| \| Link \| ``from_segment, to_segment`` \|
231		+------------------------+---------------+--------------------------------+
232		\| \| Containment \| ``from_segment, to_segment`` \|
233		+------------------------+---------------+--------------------------------+
234		\| [OrientedIdentifier] \| Path \| ``segment_names`` \|
235		+------------------------+---------------+--------------------------------+
236		\| Orientation \| Link \| ``from_orient, to_orient`` \|
237		+------------------------+---------------+--------------------------------+
238		\| \| Containment \| ``from_orient, to_orient`` \|
239		+------------------------+---------------+--------------------------------+
240		\| Sequence \| Segment \| ``sequence`` \|
241		+------------------------+---------------+--------------------------------+
242		\| Alignment \| Link \| ``overlap`` \|
243		+------------------------+---------------+--------------------------------+
244		\| \| Containment \| ``overlap`` \|
245		+------------------------+---------------+--------------------------------+
246		\| [Alignment] \| Path \| ``overlaps`` \|
247		+------------------------+---------------+--------------------------------+
248		\| Position \| Containment \| ``pos`` \|
249		+------------------------+---------------+--------------------------------+
250
251		GFA2 datatypes
252		^^^^^^^^^^^^^^
253
254		+------------------------+---------------+----------------------------------+
255		\| Datatype \| Record Type \| Fields \|
256		+========================+===============+==================================+
257		\| Itentifier \| Segment \| ``sid`` \|
258		+------------------------+---------------+----------------------------------+
259		\| \| Fragment \| ``sid`` \|
260		+------------------------+---------------+----------------------------------+
261		\| OrientedIdentifier \| Edge \| ``sid1, sid2`` \|
262		+------------------------+---------------+----------------------------------+
263		\| \| Gap \| ``sid1, sid2`` \|
264		+------------------------+---------------+----------------------------------+
265		\| \| Fragment \| ``external`` \|
266		+------------------------+---------------+----------------------------------+
267		\| OptionalIdentifier \| Edge \| ``eid`` \|
268		+------------------------+---------------+----------------------------------+
269		\| \| Gap \| ``gid`` \|
270		+------------------------+---------------+----------------------------------+
271		\| \| U Group \| ``oid`` \|
272		+------------------------+---------------+----------------------------------+
273		\| \| O Group \| ``uid`` \|
274		+------------------------+---------------+----------------------------------+
275		\| [Identifier] \| U Group \| ``items`` \|
276		+------------------------+---------------+----------------------------------+
277		\| [OrientedIdentifier] \| O Group \| ``items`` \|
278		+------------------------+---------------+----------------------------------+
279		\| Sequence \| Segment \| ``sequence`` \|
280		+------------------------+---------------+----------------------------------+
281		\| Alignment \| Edge \| ``alignment`` \|
282		+------------------------+---------------+----------------------------------+
283		\| \| Fragment \| ``alignment`` \|
284		+------------------------+---------------+----------------------------------+
285		\| Position \| Edge \| ``beg1, end1, beg2, end2`` \|
286		+------------------------+---------------+----------------------------------+
287		\| \| Fragment \| ``s_beg, s_end, f_beg, f_end`` \|
288		+------------------------+---------------+----------------------------------+
289		\| Integer \| Gap \| ``disp, var`` \|
290		+------------------------+---------------+----------------------------------+
291
292		Reading and writing positional fields
293		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
294
295		The ``positional_fieldnames`` method returns the list of the names (as
296		strings) of the positional fields of a line. The positional fields can
297		be read using a method on the Gfapy line object, which is called as the
298		field name. Setting the value is done with an equal sign version of the
299		field name method (e.g. segment.slen = 120). In alternative, the
300		``set(fieldname, value)`` and ``get(fieldname)`` methods can also be
301		used.
302
303		.. doctest::
304
305		>>> s_gfa1 = gfapy.Line("S\t1\t*")
306		>>> s_gfa1.positional_fieldnames
307		['name', 'sequence']
308		>>> s_gfa1.name
309		'1'
310		>>> s_gfa1.get("name")
311		'1'
312		>>> s_gfa1.name = "segment2"
313		>>> s_gfa1.name
314		'segment2'
315		>>> s_gfa1.set('name',"3")
316		>>> s_gfa1.name
317		'3'
318
319		When a field is read, the value is converted into an appropriate object.
320		The string representation of a field can be read using the
321		``field_to_s(fieldname)`` method.
322
323		.. doctest::
324
325		>>> gfa = gfapy.Gfa()
326		>>> gfa.add_line("S\ts1\t*")
327		>>> gfa.add_line("L\ts1\t+\ts2\t-\t*")
328		>>> link = gfa.dovetails[0]
329		>>> str(link.from_segment)
330		'S\ts1\t*'
331		>>> link.field_to_s('from_segment')
332		's1'
333
334		When setting a non-string field, the user can specify the value of a tag
335		either as a Python non-string object, or as the string representation of
336		the value.
337
338		.. doctest::
339
340		>>> gfa = gfapy.Gfa(version='gfa1')
341		>>> gfa.add_line("C\ta\t+\tb\t-\t10\t*")
342		>>> c = gfa.containments[0]
343		>>> c.pos
344		10
345		>>> c.pos = 1
346		>>> c.pos
347		1
348		>>> c.pos = "2"
349		>>> c.pos
350		2
351		>>> c.field_to_s("pos")
352		'2'
353
354		Note that setting the value of reference and backreferences-related
355		fields is generally not allowed, when a line instance is connected to a
356		Gfa object (see the :ref:`references` chapter).
357
358		.. doctest::
359
360		>>> gfa = gfapy.Gfa(version='gfa1')
361		>>> l = gfapy.Line("L\ts1\t+\ts2\t-\t*")
362		>>> l.from_name
363		's1'
364		>>> l.from_segment = "s3"
365		>>> l.from_name
366		's3'
367		>>> gfa.add_line(l)
368		>>> l.from_segment = "s4"
369		Traceback (most recent call last):
370		...
371		gfapy.error.RuntimeError: ...
372
373		Validation
374		~~~~~~~~~~
375
376		The content of all positional fields must be a correctly formatted
377		string according to the rules given in the GFA specifications (or a
378		Python object whose string representation is a correctly formatted
379		string).
380
381		Depending on the validation level, more or less checks are done
382		automatically (see the :ref:`validation` chapter). Not regarding which
383		validation level is selected, the user can trigger a manual validation
384		using the ``validate_field(fieldname)`` method for a single field, or
385		using ``validate``, which does a full validation on the whole line,
386		including all positional fields.
387
388		.. doctest::
389
390		>>> line = gfapy.Line("H\txx:i:1")
391		>>> line.validate_field("xx")
392		>>> line.validate()
393
394		Aliases
395		~~~~~~~
396
397		For some fields, aliases are defined, which can be used in all contexts
398		where the original field name is used (i.e. as parameter of a method,
399		and the same setter and getter methods defined for the original field
400		name are also defined for each alias, see below).
401
402		.. doctest::
403
404		>>> gfa1_path = gfapy.Line("P\tX\t1-,2+,3+\t*")
405		>>> gfa1_path.name == gfa1_path.path_name
406		True
407		>>> edge = gfapy.Line("E\t\tA+\tB-\t0\t10\t90\t100$\t")
408		>>> edge.eid == edge.name
409		True
410		>>> containment = gfapy.Line("C\tA\t+\tB\t-\t10\t*")
411		>>> containment.from_segment == containment.container
412		True
413		>>> segment = gfapy.Line("S\t1\t*")
414		>>> segment.sid == segment.name
415		True
416		>>> segment.sid
417		'1'
418		>>> segment.name = '2'
419		>>> segment.sid
420		'2'
421
422		Name
423		^^^^
424
425		Different record types have an identifier field: segments (name in GFA1,
426		sid in GFA2), paths (path\_name), edge (eid), fragment (sid), gap (gid),
427		groups (pid).
428
429		All these fields are aliased to ``name``. This allows the user for
430		example to set the identifier of a line using the ``name=(value)``
431		method using the same syntax for different record types (segments,
432		edges, paths, fragments, gaps and groups).
433
434		Version-specific field names
435		^^^^^^^^^^^^^^^^^^^^^^^^^^^^
436
437		For segments the GFA1 name and the GFA2 sid are equivalent fields. For
438		this reason an alias ``sid`` is defined for GFA1 segments and ``name``
439		for GFA2 segments.
440
441		Crypical field names
442		^^^^^^^^^^^^^^^^^^^^
443
444		The definition of from and to for containments is somewhat cryptic.
445		Therefore following aliases have been defined for containments:
446		container[\_orient] for from[\_\\|segment\\|orient]; contained[\_orient]
447		for to[\_segment\\|orient].

+0

-75

~~doc/tutorial/positions.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3
4		.. _positions:
5
6		Positions
7		---------
8
9		The only position field in GFA1 is the ``pos`` field in the C lines.
10		This represents the starting position of the contained segment in the
11		container segment and is 0-based.
12
13		Some fields in GFA2 E lines (``beg1, beg2, end1, end2``) and F lines
14		(``s_beg, s_end, f_beg, f_end``) are positions. According to the
15		specification, they are 0-based and represent virtual ticks before and
16		after each string in the sequence. Thus ranges are represented similarly
17		to the Python range conventions: e.g. a 1-character prefix of a sequence
18		will have begin 0 and end 1.
19
20		Last positions in GFA2
21		~~~~~~~~~~~~~~~~~~~~~~
22
23		The GFA2 positions must contain an additional string (``$``) appended to
24		the integer, if (and only if) they are the last position in the segment
25		sequence. These particular positions are represented in Gfapy as
26		instances of the class :class:`~gfapy.lastpos.LastPos`.
27
28		To create a lastpos instance, the constructor can be used with an
29		integer, or the string representation (which must end with the dollar
30		sign, otherwise an integer is returned):
31
32		.. doctest::
33
34		>>> str(gfapy.LastPos(12))
35		'12$'
36		>>> gfapy.LastPos("12")
37		12
38		>>> str(gfapy.LastPos("12"))
39		'12'
40		>>> gfapy.LastPos("12$")
41		gfapy.LastPos(12)
42		>>> str(gfapy.LastPos("12$"))
43		'12$'
44
45		Subtracting an integer from a lastpos returns a lastpos if 0 subtracted,
46		an integer otherwise. This allows to do some arithmetic on positions
47		without making them invalid.
48
49		.. doctest::
50
51		>>> gfapy.LastPos(12) - 0
52		gfapy.LastPos(12)
53		>>> gfapy.LastPos(12) - 1
54		11
55
56		The functions :func:`~gfapy.lastpos.islastpos` and
57		:func:`~gfapy.lastpos.isfirstpos` allow to
58		determine if a position value is 0 (first), or the last position, using
59		the same syntax for lastpos and integer instances.
60
61		.. doctest::
62
63		>>> gfapy.isfirstpos(0)
64		True
65		>>> gfapy.islastpos(0)
66		False
67		>>> gfapy.isfirstpos(12)
68		False
69		>>> gfapy.islastpos(12)
70		False
71		>>> gfapy.islastpos(gfapy.LastPos("12"))
72		False
73		>>> gfapy.islastpos(gfapy.LastPos("12$"))
74		True

+0

-443

~~doc/tutorial/references.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		gfa = gfapy.Gfa()
4
5		.. _references:
6
7		References
8		----------
9
10		Some fields in GFA lines contain identifiers or lists of identifiers
11		(sometimes followed by orientation strings), which reference other lines
12		of the GFA file. In Gfapy it is possible to follow these references and
13		traverse the graph.
14
15		Connecting a line to a Gfa object
16		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17
18		In stand-alone line instances, the identifiers which reference other
19		lines are either strings containing the line name, pairs of strings
20		(name and orientation) in a ``gfapy.OrientedLine`` object, or lists of
21		lines names or ``gfapy.OrientedLine`` objects.
22
23		Using the ``add_line(line)`` (alias: ``append(line)``) method of the
24		``gfapy.Gfa`` object, or the equivalent ``connect(gfa)`` method of the
25		gfapy.Line instance, a line is added to a Gfa instance (this is done
26		automatically when a GFA file is parsed). All strings expressing
27		references are then changed into references to the corresponding line
28		objects. The method ``is_connected()`` allows to determine if a line is
29		connected to a gfapy instance. The read-only property ``gfa`` contains
30		the ``gfapy.Gfa`` instance to which the line is connected.
31
32		.. doctest::
33
34		>>> gfa = gfapy.Gfa(version='gfa1')
35		>>> link = gfapy.Line("L\tA\t-\tB\t+\t20M")
36		>>> link.is_connected()
37		False
38		>>> link.gfa is None
39		True
40		>>> type(link.from_segment)
41		<class 'str'>
42		>>> gfa.append(link)
43		>>> link.is_connected()
44		True
45		>>> link.gfa #doctest: +ELLIPSIS
46		<gfapy.gfa.Gfa object at ...>
47		>>> type(link.from_segment)
48		<class 'gfapy.line.segment.gfa1.GFA1'>
49
50		References for each record type
51		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
52
53		The following tables describes the references contained in each record
54		type. The notation ``[]`` represent lists.
55
56		GFA1
57		^^^^
58
59		+---------------+-------------------+---------------------------+
60		\| Record type \| Fields \| Type of reference \|
61		+===============+===================+===========================+
62		\| Link \| from, to \| Segment \|
63		+---------------+-------------------+---------------------------+
64		\| Containment \| from, to \| Segment \|
65		+---------------+-------------------+---------------------------+
66		\| Path \| segment\_names, \| [OrientedLine(Segment)] \|
67		+---------------+-------------------+---------------------------+
68		\| \| links (1) \| [OrientedLine(Link)] \|
69		+---------------+-------------------+---------------------------+
70
71		(1): paths contain information in the fields segment\_names and
72		overlaps, which allow to find the identify from which they depend; these
73		links can be retrieved using ``links`` (which is not a field).
74
75		GFA2
76		^^^^
77
78		+---------------+--------------+------------------------------------+
79		\| Record type \| Fields \| Type of reference \|
80		+===============+==============+====================================+
81		\| Edge \| sid1, sid2 \| Segment \|
82		+---------------+--------------+------------------------------------+
83		\| Gap \| sid1, sid2 \| Segment \|
84		+---------------+--------------+------------------------------------+
85		\| Fragment \| sid \| Segment \|
86		+---------------+--------------+------------------------------------+
87		\| Set \| items \| [Edge/Set/Path/Segment] \|
88		+---------------+--------------+------------------------------------+
89		\| Path \| items \| [OrientedLine(Edge/Set/Segment)] \|
90		+---------------+--------------+------------------------------------+
91
92		Backreferences for each record type
93		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
94
95		When a line containing a reference to another line is connected to a Gfa
96		object, backreferences to it are created in the targeted line.
97
98		For each backreference collection a read-only property exist, which is
99		named as the collection (e.g. ``dovetails_L`` for segments). Note that
100		the reference list returned by these arrays are read-only and editing
101		the references is done using other methods (see the section "Editing
102		reference fields" below).
103
104		.. code:: python
105
106		segment.dovetails_L # => [gfapy.line.edge.Link(...), ...]
107
108		The following tables describe the backreferences collections for each
109		record type.
110
111		GFA1
112		^^^^
113
114		+---------------+-------------------------+
115		\| Record type \| Backreferences \|
116		+===============+=========================+
117		\| Segment \| dovetails\_L \|
118		+---------------+-------------------------+
119		\| \| dovetails\_R \|
120		+---------------+-------------------------+
121		\| \| edges\_to\_contained \|
122		+---------------+-------------------------+
123		\| \| edges\_to\_containers \|
124		+---------------+-------------------------+
125		\| \| paths \|
126		+---------------+-------------------------+
127		\| Link \| paths \|
128		+---------------+-------------------------+
129
130		GFA2
131		^^^^
132
133		+---------------+-------------------------+--------+
134		\| Record type \| Backreferences \| Type \|
135		+===============+=========================+========+
136		\| Segment \| dovetails\_L \| E \|
137		+---------------+-------------------------+--------+
138		\| \| dovetails\_R \| E \|
139		+---------------+-------------------------+--------+
140		\| \| edges\_to\_contained \| E \|
141		+---------------+-------------------------+--------+
142		\| \| edges\_to\_containers \| E \|
143		+---------------+-------------------------+--------+
144		\| \| internals \| E \|
145		+---------------+-------------------------+--------+
146		\| \| gaps\_L \| G \|
147		+---------------+-------------------------+--------+
148		\| \| gaps\_R \| G \|
149		+---------------+-------------------------+--------+
150		\| \| fragments \| F \|
151		+---------------+-------------------------+--------+
152		\| \| paths \| O \|
153		+---------------+-------------------------+--------+
154		\| \| sets \| U \|
155		+---------------+-------------------------+--------+
156		\| Edge \| paths \| O \|
157		+---------------+-------------------------+--------+
158		\| \| sets \| U \|
159		+---------------+-------------------------+--------+
160		\| O Group \| paths \| O \|
161		+---------------+-------------------------+--------+
162		\| \| sets \| U \|
163		+---------------+-------------------------+--------+
164		\| U Group \| sets \| U \|
165		+---------------+-------------------------+--------+
166
167		Segment backreference convenience methods
168		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
169
170		For segments, additional methods are available which combine in
171		different way the backreferences information. The
172		`dovetails_of_end` and `gaps_of_end` methods take an
173		argument ``L`` or ``R`` and return the dovetails overlaps (or gaps) of the
174		left or, respectively, right end of the segment sequence
175		(equivalent to the segment properties ``dovetails_L``/``dovetails_R`` and
176		``gaps_L``/``gaps_R``).
177
178		The segment ``containments`` property is a list of both containments where the
179		segment is the container or the contained segment. The segment ``edges``
180		property is a list of all edges (dovetails, containments and internals)
181		with a reference to the segment.
182
183		Other methods directly compute list of segments from the edges lists
184		mentioned above. The ``neighbours_L``, ``neighbours_R`` properties and
185		the `neighbours` method compute the set of segment instances which are
186		connected by dovetails to the segment.
187		The ``containers`` and ``contained``
188		properties similarly compute the set of segment instances which,
189		respectively, contains the segment, or are contained in the segment.
190
191		.. doctest::
192
193		>>> gfa = gfapy.Gfa()
194		>>> gfa.append('S\tA\t*')
195		>>> s = gfa.segment('A')
196		>>> gfa.append('S\tB\t*')
197		>>> gfa.append('S\tC\t*')
198		>>> gfa.append('L\tA\t-\tB\t+\t*')
199		>>> gfa.append('C\tA\t+\tC\t+\t10\t*')
200		>>> [str(l) for l in s.dovetails_of_end("L")]
201		['L\tA\t-\tB\t+\t*']
202		>>> s.dovetails_L == s.dovetails_of_end("L")
203		True
204		>>> s.gaps_of_end("R")
205		[]
206		>>> [str(e) for e in s.edges]
207		['L\tA\t-\tB\t+\t', 'C\tA\t+\tC\t+\t10\t']
208		>>> [str(n) for n in s.neighbours_L]
209		['S\tB\t*']
210		>>> s.containers
211		[]
212		>>> [str(c) for c in s.contained]
213		['S\tC\t*']
214
215		Multiline group definitions
216		~~~~~~~~~~~~~~~~~~~~~~~~~~~
217
218		The GFA2 specification opens the possibility (experimental) to define
219		groups on multiple lines, by using the same ID for each line defining
220		the group. This is supported by gfapy.
221
222		This means that if multiple `Ordered` or
223		`Unordered` instances connected to a Gfa object have
224		the same ``gid``, they are merged into a single instance (technically
225		the last one getting added to the graph object). The items list are
226		merged.
227
228		The tags of multiple line defining a group shall not contradict each
229		other (i.e. either are the tag names on different lines defining the
230		group all different, or, if the same tag is present on different lines,
231		the value and datatype must be the same, in which case the multiple
232		definition will be ignored).
233
234		.. doctest::
235
236		>>> gfa = gfapy.Gfa()
237		>>> gfa.add_line("U\tu1\ts1 s2 s3")
238		>>> [s.name for s in gfa.sets[-1].items]
239		['s1', 's2', 's3']
240		>>> gfa.add_line('U\tu1\t4 5')
241		>>> [s.name for s in gfa.sets[-1].items]
242		['s1', 's2', 's3', '4', '5']
243
244		Induced set and captured path
245		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
246
247		The item list in GFA2 sets and paths may not contain elements which are
248		implicitly involved. For example a path may contain segments, without
249		specifying the edges connecting them, if there is only one such edge.
250		Alternatively a path may contain edges, without explicitly indicating the
251		segments. Similarly a set may contain edges, but not the segments
252		referred to in them, or contain segments which are connected by edges,
253		without the edges themselves. Furthermore groups may refer to other
254		groups (set to sets or paths, paths to paths only), which then
255		indirectly contain references to segments and edges.
256
257		Gfapy provides methods for the computation of the sets of segments and
258		edges which are implied by an ordered or unordered group. Thereby all
259		references to subgroups are resolved and implicit elements are added, as
260		described in the specification. The computation can, therefore, only be
261		applied to connected lines. For unordered groups, this computation is
262		provided by the method ``induced_set()``, which returns an array of
263		segment and edge instances. For ordered group, the computation is
264		provided by the method ``captured_path()``, which returns a list of
265		``gfapy.OrientedLine`` instances, alternating segment and edge instances
266		(and starting and ending in segments).
267
268		The methods ``induced_segments_set()``, ``induced_edges_set()``,
269		``captured_segments()`` and ``captured_edges()`` return, respectively,
270		the list of only segments or edges, in ordered or unordered groups.
271
272		.. doctest::
273
274		>>> gfa = gfapy.Gfa()
275		>>> gfa.add_line("S\ts1\t100\t*")
276		>>> gfa.add_line("S\ts2\t100\t*")
277		>>> gfa.add_line("S\ts3\t100\t*")
278		>>> gfa.add_line("E\te1\ts1+\ts2-\t0\t10\t90\t100$\t*")
279		>>> gfa.add_line("U\tu1\ts1 s2 s3")
280		>>> u = gfa.sets[-1]
281		>>> [l.name for l in u.induced_edges_set]
282		['e1']
283		>>> [l.name for l in u.induced_segments_set ]
284		['s1', 's2', 's3']
285		>>> [l.name for l in u.induced_set ]
286		['s1', 's2', 's3', 'e1']
287
288		Disconnecting a line from a Gfa object
289		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
290
291		Lines can be disconnected using the ``rm(line)`` method of the
292		``gfapy.Gfa`` object or the ``disconnect()`` method of the line
293		instance.
294
295		.. doctest::
296
297		>>> gfa = gfapy.Gfa()
298		>>> gfa.append('S\tsA\t*')
299		>>> gfa.append('S\tsB\t*')
300		>>> line = gfa.segment("sA")
301		>>> gfa.segment_names
302		['sA', 'sB']
303		>>> gfa.rm(line)
304		>>> gfa.segment_names
305		['sB']
306		>>> line = gfa.segment('sB')
307		>>> line.disconnect()
308		>>> gfa.segment_names
309		[]
310
311		Disconnecting a line affects other lines as well. Lines which are
312		dependent on the disconnected line are disconnected as well. Any other
313		reference to disconnected lines is removed as well. In the disconnected
314		line, references to lines are transformed back to strings and
315		backreferences are deleted.
316
317		The following tables show which dependent lines are disconnected if they
318		refer to a line which is being disconnected.
319
320		GFA1
321		^^^^
322
323		+---------------+---------------------------------+
324		\| Record type \| Dependent lines \|
325		+===============+=================================+
326		\| Segment \| links (+ paths), containments \|
327		+---------------+---------------------------------+
328		\| Link \| paths \|
329		+---------------+---------------------------------+
330
331		GFA2
332		^^^^
333
334		+---------------+---------------------------------------+
335		\| Record type \| Dependent lines \|
336		+===============+=======================================+
337		\| Segment \| edges, gaps, fragments, sets, paths \|
338		+---------------+---------------------------------------+
339		\| Edge \| sets, paths \|
340		+---------------+---------------------------------------+
341		\| Sets \| sets, paths \|
342		+---------------+---------------------------------------+
343
344		Editing reference fields
345		~~~~~~~~~~~~~~~~~~~~~~~~
346
347		In connected line instances, it is not allowed to directly change the
348		content of fields containing references to other lines, as this would
349		make the state of the Gfa object invalid.
350
351		Besides the fields containing references, some other fields are
352		read-only in connected lines. Changing some of the fields would require
353		moving the backreferences to other collections (position fields of edges
354		and gaps, ``from_orient`` and ``to_orient`` of links). The overlaps
355		field of connected links is readonly as it may be necessary to identify
356		the link in paths.
357
358		Renaming an element
359		^^^^^^^^^^^^^^^^^^^
360
361		The name field of a line (e.g. segment ``name``/``sid``) is not a
362		reference and thus can be edited also in connected lines. When the name
363		of the line is changed, no manual editing of references (e.g. from/to
364		fields in links) is necessary, as all lines which refer to the line will
365		still refer to the same instance. The references to the instance in the
366		Gfa lines collections will be automatically updated. Also, the new name
367		will be correctly used when converting to string, such as when the Gfa
368		instance is written to a GFA file.
369
370		Renaming a line to a name which already exists has the same effect of
371		adding a line with that name. That is, in most cases,
372		``gfapy.NotUniqueError`` is raised. An exception are GFA2 sets and
373		paths: in this case the line will be appended to the existing line with
374		the same name (as described in "Multiline group definitions").
375
376		Adding and removing group elements
377		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
378
379		Elements of GFA2 groups can be added and removed from both connected and
380		non-connected lines, using the following methods.
381
382		To add an item to or remove an item from an unordered group, use the
383		methods ``add_item(item)`` and ``rm_item(item)``, which take as argument
384		either a string (identifier) or a line instance.
385
386		To append or prepend an item to an ordered group, use the methods
387		``append_item(item)`` and ``prepend_item(item)``. To remove the first or
388		the last item of an ordered group use the methods ``rm_first_item()``
389		and ``rm_last_item()``.
390
391		Editing read-only fields of connected lines
392		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
393
394		Editing the read-only information of edges, gaps, links, containments,
395		fragments and paths is more complicated. These lines shall be
396		disconnected before the edit and connected again to the Gfa object after
397		it. Before disconnecting a line, you should check if there are other
398		lines dependent on it (see tables above). If so, you will have to
399		disconnect these lines first, eventually update their fields and
400		reconnect them at the end of the operation.
401
402		Virtual lines
403		~~~~~~~~~~~~~
404
405		The order of the lines in GFA is not prescribed. Therefore, during
406		parsing, or constructing a Gfa in memory, it is possible that a line is
407		referenced to, before it is added to the Gfa instance. Whenever this
408		happens, Gfapy creates a "virtual" line instance.
409
410		Users do not have to handle with virtual lines, if they work with
411		complete and valid GFA files.
412
413		Virtual lines are similar to normal line instances, with some
414		limitations (they contain only limited information and it is not allowed
415		to add tags to them). To check if a line is a virtual line, one can use
416		the ``virtual`` property of the line.
417
418		As soon as the parser founds the real line corresponding to a previously
419		introduced virtual line, the virtual line is exchanged with the real
420		line and all references are corrected to point to the real line.
421
422		.. doctest::
423
424		>>> g = gfapy.Gfa()
425		>>> g.add_line("S\t1\t*")
426		>>> g.add_line("L\t1\t+\t2\t+\t*")
427		>>> l = g.dovetails[0]
428		>>> g.segment("1").virtual
429		False
430		>>> g.segment("2").virtual
431		True
432		>>> l.to_segment == g.segment("2")
433		True
434		>>> g.segment("2").dovetails == [l]
435		True
436		>>> g.add_line("S\t2\t*")
437		>>> g.segment("2").virtual
438		False
439		>>> l.to_segment == g.segment("2")
440		True
441		>>> g.segment("2").dovetails == [l]
442		True

+0

-36

~~doc/tutorial/rgfa.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3
4		.. _rgfa:
5
6		rGFA
7		----
8
9		rGFA (https://github.com/lh3/gfatools/blob/master/doc/rGFA.md)
10		is a subset of GFA1, in which only particular line types (S and L)
11		are allowed, and the S lines are required to contain the tags
12		`SN` (of type `Z`), `SO` and `SR` (of type `i`).
13
14		When working with rGFA files, it is convenient to use the `dialect="rgfa"`
15		option in the constructor `Gfa()` and in
16		func:`Gfa.from_file() <gfapy.gfa.Gfa.from_file>`.
17
18		This ensures that additional validations are performed: GFA version must be 1,
19		only rGFA-compatible lines (S,L) are allowed and that the required tags are
20		required (with the correct datatype). The validations can also be executed
21		manually using `Gfa.validate_rgfa() <gfapy.gfa.Gfa.validate_rgfa>`.
22
23		Furthermore, the `stable_sequence_names` attribute of the GFA objects
24		returns the set of stable sequence names contained in the `SN` tags
25		of the segments.
26
27		.. doctest::
28
29		>>> g = gfapy.Gfa("S\tS1\tCTGAA\tSN:Z:chr1\tSO:i:0\tSR:i:0", dialect="rgfa")
30		>>> g.segment_names
31		['S1']
32		>>> g.stable_sequence_names
33		['chr1']
34		>>> g.add_line("S\tS2\tACG\tSN:Z:chr1\tSO:i:5\tSR:i:0")
35

+0

-436

~~doc/tutorial/tags.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		gfa = gfapy.Gfa()
4
5		.. _tags:
6
7		Tags
8		----
9
10		Each record in GFA can contain tags. Tags are fields which consist in a
11		tag name, a datatype and data. The format is ``NN:T:DATA`` where ``NN``
12		is a two-letter tag name, ``T`` is a one-letter datatype string and
13		``DATA`` is a string representing the data according to the specified
14		datatype. Tag names must be unique for each line, i.e. each line may
15		only contain a tag once.
16
17		::
18
19		# Examples of GFA tags of different datatypes:
20		"aa:i:-12"
21		"bb:f:1.23"
22		"cc:Z:this is a string"
23		"dd:A:X"
24		"ee:B:c,12,3,2"
25		"ff:H:122FA0"
26		'gg:J:["A","B"]'
27
28		Custom tags
29		~~~~~~~~~~~
30
31		Some tags are explicitly defined in the specification (these are named
32		predefined tags in Gfapy), and the user or an application can define
33		its own custom tags. These may contain lower case letters.
34
35		Custom tags are user or program specific and may of course collide with
36		the tags used by other users or programs. For this reasons, if you write
37		scripts which employ custom tags, you should always check that the
38		values are of the correct datatype and plausible.
39
40		.. doctest::
41
42		>>> line = gfapy.Line("H\txx:i:2")
43		>>> if line.get_datatype("xx") != "i":
44		... raise Exception("I expected the tag xx to contain an integer!")
45		>>> myvalue = line.xx
46		>>> if (myvalue > 120) or (myvalue % 2 == 1):
47		... raise Exception("The value in the xx tag is not an even value <= 120")
48		>>> # ... do something with myvalue
49
50		Also it is good practice to allow the user of the script to change the
51		name of the custom tags. For example, Gfapy employs the +or+ custom tag
52		to track the original segment from which a segment in the final graph is
53		derived. All methods which read or write the +or+ tag allow to specify
54		an alternative tag name to use instead of +or+, for the case that this
55		name collides with the custom tag of another program.
56
57		.. code:: python
58
59		# E.g. a method which does something with myvalue, usually stored in tag xx
60		# allows the user to specify an alternative name for the tag
61		def mymethod(line, mytag="xx"):
62		myvalue = line.get(mytag)
63		# ...
64
65		Predefined tags
66		~~~~~~~~~~~~~~~
67
68		According to the GFA specifications, predefined tag names consist of either
69		two upper case letters, or an upper case letter followed by a digit.
70		The GFA1 specification predefines tags for each line type, while GFA2
71		only predefines tags for the header and edges.
72
73		While tags with the predefined names are allowed to be added to any line,
74		when they are used in the lines mentiones in the specification (e.g. `VN`
75		in the header) gfapy checks that the datatype is the one prescribed by
76		the specification (e.g. `VN` must be of type `Z`). It is not forbidden
77		to use the same tags in other contexts, but in this case, the datatype
78		restriction is not enforced.
79
80		+------------+------------+-----------------------+
81		\| Tag \| Type \| Line types \| GFA version \|
82		+============+============+=======================+
83		\| VN \| Z \| H \| 1,2 \|
84		+-----+------+------------+-----------------------+
85		\| TS \| i \| H,S \| 2 \|
86		+-----+------+------------+-----------------------+
87		\| LN \| i \| S \| 1 \|
88		+-----+------+------------+-----------------------+
89		\| RC \| i \| S,L,C \| 1 \|
90		+-----+------+------------+-----------------------+
91		\| FC \| i \| S,L \| 1 \|
92		+-----+------+------------+-----------------------+
93		\| KC \| i \| S,L \| 1 \|
94		+-----+------+------------+-----------------------+
95		\| SH \| H \| S \| 1 \|
96		+-----+------+------------+-----------------------+
97		\| UR \| Z \| S \| 1 \|
98		+-----+------+------------+-----------------------+
99		\| MQ \| i \| L \| 1 \|
100		+-----+------+------------+-----------------------+
101		\| NM \| i \| L,i \| 1 \|
102		+-----+------+------------+-----------------------+
103		\| ID \| Z \| L,C \| 1 \|
104		+-----+------+------------+-----------------------+
105
106		::
107
108		"VN:Z:1.0" # VN => predefined tag
109		"z5:Z:1.0" # z5 first char is downcase => custom tag
110		"XX:Z:aaa" # XX upper case, but not predefined => custom tag
111
112		# not forbidden, but not recommended:
113		"zZ:Z:1.0" # => mixed case, first char downcase => custom tag
114		"Zz:Z:1.0" # => mixed case, first char upcase => custom tag
115		"vn:Z:1.0" # => same name as predefined tag, but downcase => custom tag
116
117		Datatypes
118		~~~~~~~~~
119
120		The following table summarizes the datatypes available for tags:
121
122		+----------+-----------------+---------------------------+----------------------+
123		\| Symbol \| Datatype \| Example \| Python class \|
124		+==========+=================+===========================+======================+
125		\| Z \| string \| This is a string \| str \|
126		+----------+-----------------+---------------------------+----------------------+
127		\| i \| integer \| -12 \| int \|
128		+----------+-----------------+---------------------------+----------------------+
129		\| f \| float \| 1.2E-5 \| float \|
130		+----------+-----------------+---------------------------+----------------------+
131		\| A \| char \| X \| str \|
132		+----------+-----------------+---------------------------+----------------------+
133		\| J \| JSON \| [1,{"k1":1,"k2":2},"a"] \| list/dict \|
134		+----------+-----------------+---------------------------+----------------------+
135		\| B \| numeric array \| f,1.2,13E-2,0 \| gfapy.NumericArray \|
136		+----------+-----------------+---------------------------+----------------------+
137		\| H \| byte array \| FFAA01 \| gfapy.ByteArray \|
138		+----------+-----------------+---------------------------+----------------------+
139
140		Validation
141		~~~~~~~~~~
142
143		The tag names must consist of a letter and a digit or two letters.
144
145		::
146
147		"KC:i:1" # => OK
148		"xx:i:1" # => OK
149		"x1:i:1" # => OK
150		"xxx:i:1" # => error: name is too long
151		"x:i:1" # => error: name is too short
152		"11:i:1" # => error: at least one letter must be present
153
154		The datatype must be one of the datatypes specified above. For
155		predefined tags, Gfapy also checks that the datatype given in the
156		specification is used.
157
158		::
159
160		"xx:X:1" # => error: datatype X is unknown
161		"VN:i:1" # => error: VN must be of type Z
162
163		The data must be a correctly formatted string for the specified datatype
164		or a Python object whose string representation is a correctly formatted
165		string.
166
167		.. doctest::
168
169		# current value: xx:i:2
170		>>> line = gfapy.Line("S\tA\t*\txx:i:2")
171		>>> line.xx = 1
172		>>> line.xx
173		1
174		>>> line.xx = "3"
175		>>> line.xx
176		3
177		>>> line.xx = "A"
178		>>> line.xx
179		Traceback (most recent call last):
180		...
181		gfapy.error.FormatError: ...
182
183		Depending on the validation level, more or less checks are done
184		automatically (see :ref:`validation` chapter). Per default - validation level
185		(1) - validation is performed only during parsing or accessing values
186		the first time, therefore the user must perform a manual validation if
187		he changes values to something which is not guaranteed to be correct. To
188		trigger a manual validation, the user can call the method
189		``validate_field(fieldname)`` to validate a single tag, or
190		``validate()`` to validate the whole line, including all tags.
191
192		.. doctest::
193
194		>>> line = gfapy.Line("S\tA\t*\txx:i:2", vlevel = 0)
195		>>> line.validate_field("xx")
196		>>> line.validate()
197		>>> line.xx = "A"
198		>>> line.validate_field("xx")
199		Traceback (most recent call last):
200		...
201		gfapy.error.FormatError: ...
202		>>> line.validate()
203		Traceback (most recent call last):
204		...
205		gfapy.error.FormatError: ...
206		>>> line.xx = "3"
207		>>> line.validate_field("xx")
208		>>> line.validate()
209
210		Reading and writing tags
211		~~~~~~~~~~~~~~~~~~~~~~~~
212
213		Tags can be read using a property on the Gfapy line object, which is
214		called as the tag (e.g. line.xx). A special version of the property
215		prefixed by ``try_get_`` raises an error if the tag was not available
216		(e.g. ``line.try_get_LN``), while the tag property (e.g. ``line.LN``)
217		would return ``None`` in this case. Setting the value is done assigning
218		a value to it the tag name method (e.g. ``line.TS = 120``). In
219		alternative, the ``set(fieldname, value)``, ``get(fieldname)`` and
220		``try_get(fieldname)`` methods can also be used. To remove a tag from a
221		line, use the ``delete(fieldname)`` method, or set its value to
222		``None``. The ``tagnames`` property Line instances is a list of
223		the names (as strings) of all defined tags for a line.
224
225
226		.. doctest::
227
228		>>> line = gfapy.Line("S\tA\t*\txx:i:1", vlevel = 0)
229		>>> line.xx
230		1
231		>>> line.xy is None
232		True
233		>>> line.try_get_xx()
234		1
235		>>> line.try_get_xy()
236		Traceback (most recent call last):
237		...
238		gfapy.error.NotFoundError: ...
239		>>> line.get("xx")
240		1
241		>>> line.try_get("xy")
242		Traceback (most recent call last):
243		...
244		gfapy.error.NotFoundError: ...
245		>>> line.xx = 2
246		>>> line.xx
247		2
248		>>> line.xx = "a"
249		>>> line.tagnames
250		['xx']
251		>>> line.xy = 2
252		>>> line.xy
253		2
254		>>> line.set("xy", 3)
255		>>> line.get("xy")
256		3
257		>>> line.tagnames
258		['xx', 'xy']
259		>>> line.delete("xy")
260		3
261		>>> line.xy is None
262		True
263		>>> line.xx = None
264		>>> line.xx is None
265		True
266		>>> line.try_get("xx")
267		Traceback (most recent call last):
268		...
269		gfapy.error.NotFoundError: ...
270		>>> line.tagnames
271		[]
272
273		When a tag is read, the value is converted into an appropriate object
274		(see Python classes in the datatype table above). When setting a value,
275		the user can specify the value of a tag either as a Python object, or as
276		the string representation of the value.
277
278		.. doctest::
279
280		>>> line = gfapy.Line('H\txx:i:1\txy:Z:TEXT\txz:J:["a","b"]')
281		>>> line.xx
282		1
283		>>> isinstance(line.xx, int)
284		True
285		>>> line.xy
286		'TEXT'
287		>>> isinstance(line.xy, str)
288		True
289		>>> line.xz
290		['a', 'b']
291		>>> isinstance(line.xz, list)
292		True
293
294		The string representation of a tag can be read using the
295		``field_to_s(fieldname)`` method. The default is to only output the
296		content of the field. By setting \`\`tag: true\`\`\`, the entire tag is
297		output (name, datatype, content, separated by colons). An exception is
298		raised if the field does not exist.
299
300		.. doctest::
301
302		>>> line = gfapy.Line("H\txx:i:1")
303		>>> line.xx
304		1
305		>>> line.field_to_s("xx")
306		'1'
307		>>> line.field_to_s("xx", tag=True)
308		'xx:i:1'
309
310		Datatype of custom tags
311		~~~~~~~~~~~~~~~~~~~~~~~
312
313		The datatype of an existing custom field (but not of predefined fields)
314		can be changed using the ``set_datatype(fieldname, datatype)`` method.
315		The current datatype specification can be read using
316		``get_datatype(fieldname)``.
317
318		.. doctest::
319
320		>>> line = gfapy.Line("H\txx:i:1")
321		>>> line.get_datatype("xx")
322		'i'
323		>>> line.set_datatype("xx", "Z")
324		>>> line.get_datatype("xx")
325		'Z'
326
327		If a new custom tag is specified, Gfapy selects the correct datatype for
328		it: i/f for numeric values, J/B for arrays, J for hashes and Z for
329		strings and strings. If the user wants to specify a different datatype,
330		he may do so by setting it with ``set_datatype()`` (this can be done
331		also before assigning a value, which is necessary if full validation is
332		active).
333
334		.. doctest::
335
336		>>> line = gfapy.Line("H")
337		>>> line.xx = "1"
338		>>> line.xx
339		'1'
340		>>> line.set_datatype("xy", "i")
341		>>> line.xy = "1"
342		>>> line.xy
343		1
344
345		Arrays of numerical values
346		~~~~~~~~~~~~~~~~~~~~~~~~~~
347
348		``B`` and ``H`` tags represent array with particular constraints (e.g.
349		they can only contain numeric values, and in some cases the values must
350		be in predefined ranges). In order to represent them correctly and allow
351		for validation, Python classes have been defined for both kind of tags:
352		``gfapy.ByteArray`` for ``H`` and ``gfapy.NumericArray`` for ``B``
353		fields.
354
355		Both are subclasses of list. Object of the two classes can be created by
356		passing an existing list or the string representation to the class
357		constructor.
358
359		.. doctest::
360
361		>>> # create a byte array instance
362		>>> gfapy.ByteArray([12,3,14])
363		b'\x0c\x03\x0e'
364		>>> gfapy.ByteArray("A012FF")
365		b'\xa0\x12\xff'
366		>>> # create a numeric array instance
367		>>> gfapy.NumericArray.from_string("c,12,3,14")
368		[12, 3, 14]
369		>>> gfapy.NumericArray([12,3,14])
370		[12, 3, 14]
371
372		Instances of the classes behave as normal lists, except that they
373		provide a #validate() method, which checks the constraints, and that
374		their string representation is the GFA string representation of the
375		field value.
376
377		.. doctest::
378
379		>>> gfapy.NumericArray([12,1,"1x"]).validate()
380		Traceback (most recent call last):
381		...
382		gfapy.error.ValueError
383		>>> str(gfapy.NumericArray([12,3,14]))
384		'C,12,3,14'
385		>>> gfapy.ByteArray([12,1,"1x"]).validate()
386		Traceback (most recent call last):
387		...
388		gfapy.error.ValueError
389		>>> str(gfapy.ByteArray([12,3,14]))
390		'0C030E'
391
392		For numeric values, the `compute_subtype` method allows to compute
393		the subtype which will be used for the string representation. Unsigned
394		subtypes are used if all values are positive. The smallest possible
395		subtype range is selected. The subtype may change when the range of the
396		elements changes.
397
398		.. doctest::
399
400		>>> gfapy.NumericArray([12,13,14]).compute_subtype()
401		'C'
402
403		Special cases: custom records, headers, comments and virtual lines.
404		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
405
406		GFA2 allows custom records, introduced by record type strings other than
407		the predefined ones. Gfapy uses a pragmatical approach for identifying
408		tags in custom records, and tries to interpret the rightmost fields as
409		tags, until the first field from the right raises an error; all
410		remaining fields are treated as positional fields.
411
412		::
413
414		"X a b c xx:i:12" # => xx is tag, a, b, c are positional fields
415		"Y a b xx:i:12 c" # => all positional fields, as c is not a valid tag
416
417		For easier access, the entire header of the GFA is summarized in a
418		single line instance. A class (`FieldArray`) has been defined to
419		handle the special case when multiple H lines define the same tag (see
420		:ref:`header` chapter for details).
421
422		Comment lines are represented by a subclass of the same class
423		(`Line`) as the records. However, they cannot contain tags: the
424		entire line is taken as content of the comment. See the :ref:`comments`
425		chapter for more information about comments.
426
427		::
428
429		"# this is not a tag: xx:i:1" # => xx is not a tag, xx:i:1 is part of the comment
430
431		Virtual instances of the `Line` class (e.g. segment instances automatically
432		created because of not yet resolved references found in edges) cannot be
433		modified by the user, and tags cannot be specified for them. This
434		includes all instances of the `Unknown` class. See the
435		:ref:`references` chapter for more information about virtual lines.

+0

-78

~~doc/tutorial/validation.rst~~ less more

0		.. _validation:
1
2		Validation
3		----------
4
5		Different validation levels are available. They represent different
6		compromises between speed and warrant of validity. The validation level
7		can be specified when the :class:`~gfapy.gfa.Gfa` object is created, using the
8		``vlevel`` parameter of the constructor and of the
9		`from_file` method. Four levels of validation are defined
10		(0 = no validation, 1 = validation by reading, 2 = validation by reading
11		and writing, 3 = continuous validation). The default validation level
12		value is 1.
13
14		Manual validation
15		~~~~~~~~~~~~~~~~~
16
17		Independently from the validation level chosen, the user can always check the
18		value of a field calling
19		:meth:`~gfapy.line.common.validate.Validate.validate_field` on the line
20		instance. If no exception is raised, the field content is valid.
21
22		To check if the entire content of the line is valid, the user can call
23		:meth:`~gfapy.line.common.validate.Validate.validate` on the line instance.
24		This will check all fields and perform cross-field validations, such as
25		comparing the length of the sequence of a GFA1 segment, to the value of the LN
26		tag (if present).
27
28		It is also possible to validate the structure of the GFA, for example to
29		check if there are unresolved references to lines. To do this, use the
30		:meth:`~gfapy.gfa.Gfa.validate` of the :class:`~gfapy.gfa.Gfa` instance.
31
32		No validations
33		~~~~~~~~~~~~~~
34
35		If the validation is set to 0, Gfapy will try to accept any input and
36		never raise an exception. This is not always possible, and in some
37		cases, an exception will still be raised, if the data is invalid.
38
39		Validation when reading
40		~~~~~~~~~~~~~~~~~~~~~~~
41
42		If the validation level is set to 1 or higher, basic validations will be
43		performed, such as checking the number of positional fields, the
44		presence of duplicated tags, the tag datatype of predefined tags.
45		Additionally, all tags will be validated, either during parsing or on
46		first access. Record-type cross-field validations will also be
47		performed.
48
49		In other words, a validation of 1 means that Gfapy guarantees (as good
50		as it can) that the GFA content read from a file is valid, and will
51		raise an exception on accessing the data if not.
52
53		The user is supposed to call `validate_field` after changing
54		a field content to something which can be potentially invalid, or
55		:meth:`~gfapy.line.common.validate.Validate.validate` if potentially
56		cross-field validations could fail.
57
58		Validation when writing
59		~~~~~~~~~~~~~~~~~~~~~~~
60
61		Setting the level to 2 will perform all validations described above,
62		plus validate the fields content when their value is written to string.
63
64		In other words, a validation of 2 means that Gfapy guarantee (as good as
65		it can) that the GFA content read from a file and written to a file is
66		valid and will raise an exception on accessing the data or writing to
67		file if not.
68
69		Continuous validation
70		~~~~~~~~~~~~~~~~~~~~~
71
72		If the validation level is set to 3, all validations for lower levels
73		described above are run, plus a validation of fields contents each time
74		a setter method is used.
75
76		A validation of 3 means that Gfapy guarantees (as good as it can) that
77		the GFA content is always valid.

+0

-1

gfapy/alignment/cigar.py less more

0	0	import re
1	1	import gfapy
2		from .alignment import Alignment
3	2
4	3	class CIGAR(list):
5	4	"""

+0

-1

gfapy/alignment/placeholder.py less more

0	0	import gfapy
1		from .alignment import Alignment
2	1
3	2	class AlignmentPlaceholder(gfapy.Placeholder):
4	3	"""

+0

-1

gfapy/alignment/trace.py less more

0	0	import gfapy
1		from .alignment import Alignment
2	1
3	2	class Trace(list):
4	3	"""Trace alignment.

+2

-2

gfapy/field/byte_array.py less more

19	19	if isinstance(obj, gfapy.ByteArray):
20	20	return str(obj)
21	21	if isinstance(obj, list):
22		return str(ByteArray(obj))
	22	return str(gfapy.ByteArray(obj))
23	23	elif isinstance(obj, str):
24	24	return obj
25	25	else:

32	32	if isinstance(obj, gfapy.ByteArray):
33	33	return str(obj)
34	34	elif isinstance(obj, list):
35		return str(ByteArray(obj))
	35	return str(gfapy.ByteArray(obj))
36	36	elif isinstance(obj, str):
37	37	validate_encoded(obj)
38	38	return obj

+0

-1

gfapy/field/field.py less more

0	0	import gfapy
1	1	import builtins
2		import re
3	2	from .validator import Validator
4	3	from .parser import Parser
5	4	from .writer import Writer

+1

-1

gfapy/field/identifier_list_gfa2.py less more

26	26	if not re.match("^[!-~]+$", elem):
27	27	raise gfapy.FormatError(
28	28	"the list contains an invalid GFA2 identifier ({})\n"
29		.format(repr(string))+
	29	.format(repr(elem))+
30	30	"(it contains spaces and/or non-printable characters)")
31	31	else:
32	32	raise gfapy.TypeError(

+2

-2

gfapy/field/optional_identifier_gfa2.py less more

24	24	pass
25	25	elif isinstance(obj, gfapy.Line):
26	26	validate_encoded(obj.name)
27		elif isinstance(obj, String):
	27	elif isinstance(obj, str):
28	28	validate_encoded(obj)
29	29	else:
30	30	raise gfapy.TypeError(

48	48	def encode(obj):
49	49	if isinstance(obj, gfapy.Placeholder):
50	50	return str(obj)
51		elif isinstance(obj, String):
	51	elif isinstance(obj, str):
52	52	obj = str(obj)
53	53	elif isinstance(obj, gfapy.Line):
54	54	obj = str(obj.name)

+1

-1

gfapy/field/oriented_identifier_list_gfa1.py less more

24	24	elem.validate()
25	25	if not re.match(r"^[!-)+-<>-~][!-~]*$", elem.name):
26	26	raise gfapy.FormatError(
27		"#{elem.name} is not a valid GFA1 segment name\n".format(elem.name)+
	27	"{} is not a valid GFA1 segment name\n".format(elem.name)+
28	28	"(it does not match [!-)+-<>-~][!-~]*)")
29	29
30	30	def unsafe_encode(obj):

+1

-1

gfapy/field/oriented_identifier_list_gfa2.py less more

23	23	elem.validate()
24	24	if not re.match(r"^[!-~]+$", elem.name):
25	25	raise gfapy.FormatError(
26		"the list contains an invalid GFA2 identifier\n".format(elem.name)+
	26	"the list contains an invalid GFA2 identifier {}\n".format(elem.name)+
27	27	"(it contains spaces and/or non-printable characters)")
28	28	if not elem.orient in ["+", "-"]:
29	29	raise gfapy.FormatError(

+0

-2

gfapy/graph_operations/artifacts.py less more

0		import gfapy
1
2	0	class Artifacts:
3	1
4	2	def remove_small_components(self, minlen):

+0

-2

gfapy/graph_operations/copy_number.py less more

0		import gfapy
1
2	0	class CopyNumber:
3	1
4	2	def set_default_count_tag(self, tag):

+0

-1

gfapy/graph_operations/graph_operations.py less more

0		import gfapy
1	0	from .artifacts import Artifacts
2	1	from .copy_number import CopyNumber
3	2	from .invertible_segments import InvertibleSegments

+5

-2

gfapy/graph_operations/invertible_segments.py less more

	0	from itertools import groupby
	1	import gfapy
	2
0	3	class InvertibleSegments:
1	4
2	5	def randomly_orient_invertibles(self):
3	6	''' Selects a random orientation for all invertible segments.
4	7
5	8	For the definition of invertible segment, see Gonnella and Kurtz (2016).'''
6		for s in segment_names:
	9	for sn in self.segment_names:
7	10	if self._segment_same_links_both_ends(sn):
8	11	self._randomly_orient_proven_invertible_segment(sn)
9	12

65	68	return [list(v) for k,v in groupby(sorted(links,key=sig),key=sig)]
66	69
67	70	def _annotate_random_orientation(self, segment_name):
68		s = self.try_get_segment(segment_name)
	71	segment = self.try_get_segment(segment_name)
69	72	n = segment.name.split("_")
70	73	pairs = 0
71	74	pos = [1, segment.LN]

+2

-2

gfapy/graph_operations/linear_paths.py less more

2	2	class LinearPaths:
3	3
4	4	def linear_path(self, segment, exclude = None):
5		"""Finnd a linear path which contains the specified segment
	5	"""Find a linear path which contains the specified segment
6	6
7	7	Parameters:
8	8	segment (str, Line): the segment to analyse

333	333	merged.LN = len(merged.sequence)
334	334	elif self._vlevel > 0 and merged.LN != len(merged.sequence):
335	335	raise gfapy.InconsistencyError(
336		"Computed sequence length {} ".format(merged.sequence.length)+
	336	"Computed sequence length {} ".format(len(merged.sequence))+
337	337	"and computed LN {} differ".format(merged.LN))
338	338	if merged.length is not None:
339	339	for count_tag in ["KC", "RC", "FC"]:

+4

-4

gfapy/graph_operations/multiplication.py less more

50	50	raise gfapy.ArgumentError("Mulitiplication factor must be >= 0"+
51	51	" ({} found)".format(factor))
52	52	elif factor == 0:
53		if conserve_components and factor == 1 and is_cut_segment(segment):
	53	if conserve_components and factor == 1 and self.is_cut_segment(segment):
54	54	return self
55	55	else:
56	56	self.rm(segment)

97	97	processed_circulars = set()
98	98	for l in segment.dovetails + segment.containments:
99	99	if l.is_circular():
100		if l not in processed_circular:
	100	if l not in processed_circulars:
101	101	self.__divide_counts(l, factor)
102		processed_circular.append(l)
	102	processed_circulars.add(l)
103	103	else:
104	104	self.__divide_counts(l, factor)
105	105

121	121	def _select_distribute_end(self, links_distribution_policy,
122	122	segment_name, factor):
123	123	if links_distribution_policy not in self.LINKS_DISTRIBUTION_POLICY:
124		raise gfa.ArgumentError("Unknown links distribution policy {}\n".format(\
	124	raise gfapy.ArgumentError("Unknown links distribution policy {}\n".format(\
125	125	links_distribution_policy)+"accepted values are: {}".format(\
126	126	", ".join(self.LINKS_DISTRIBUTION_POLICY)))
127	127	if links_distribution_policy == "off":

+5

-5

gfapy/graph_operations/p_bubbles.py less more

23	23	count_tag=self.default["count_tag"]
24	24	if unit_length is None:
25	25	unit_length=self.default["unit_length"]
26		s1 = segment(segment_end1.segment)
27		s2 = segment(segment_end2.segment)
28		et1 = segment(segment_end1.end_type)
29		et2 = segment(segment_end2.end_type)
	26	s1 = self.try_get_segment(segment_end1.segment)
	27	s2 = self.try_get_segment(segment_end2.segment)
	28	et1 = segment_end1.end_type
	29	et2 = segment_end2.end_type
30	30	n1 = sorted(s1.neighbours(et1), key=lambda s:s.name)
31	31	n2 = sorted(s2.neighbours(et2), key=lambda s:s.name)
32	32	assert list(n1) == [os.inverted() for os in n2]

44	44	unit_length=unit_length) for s in alternatives]
45	45	alternatives.pop(coverages.index(max(coverages)))
46	46	for s in alternatives:
47		segment(s[0]).disconnect()
	47	self.try_get_segment(s[0]).disconnect()

+2

-2

gfapy/graph_operations/redundant_linear_paths.py less more

3	3
4	4	def _junction_junction_paths(self, sn, exclude):
5	5	retval = []
6		exclude.append(sn)
	6	exclude.add(sn)
7	7	s = self.segment(sn)
8	8	for dL in s.dovetails_L:
9	9	eL = dL.other_end(gfapy.SegmentEnd(s, "L"))

76	76	elif self._version == "gfa2":
77	77	mln = len(merged.sequence)
78	78	tmp_link = gfapy.line.edge.GFA2(["*",merged.name+"+", \
79		last_name+("-" if is_reversed else "+"),
	79	last.name+("-" if is_reversed else "+"),
80	80	str(mln - ln), "{}$".format(mln),
81	81	str(ln-1) if is_reversed else "0", # on purpose fake
82	82	"{}$".format(ln) if is_reversed else "1", # on purpose fake

+6

-6

gfapy/graph_operations/superfluous_links.py less more

23	23	oe = {}
24	24	for et in ["L", "R"]:
25	25	oe[et] = l[et][0].other_end(se[et])
26		if oe[:L] == oe[:R]:
	26	if oe["L"] == oe["R"]:
27	27	return
28	28	for et in ["L", "R"]:
29	29	self._delete_other_links(oe[et], se[et],
30	30	conserve_components=conserve_components)
31	31	else:
32		if l[:L].size == 1:
	32	if l["L"].size == 1:
33	33	et = "L"
34		elif l[:R].size == 1:
	34	elif l["R"].size == 1:
35	35	et = "R"
36	36	else:
37	37	return

51	51	removed, if their removal does not split connected components
52	52	of the graph (considering as connections only dovetail overlaps)
53	53	"""
54		for sn in segment_names:
	54	for sn in self.segment_names:
55	55	self.enforce_segment_mandatory_links(sn, conserve_components=
56	56	conserve_components)
57	57

63	63	Parameters:
64	64	segment (str, Line): the segment
65	65	"""
66		if not isinstance(segment, gfa.Line):
	66	if not isinstance(segment, gfapy.Line):
67	67	segment = self.try_get_segment(segment)
68	68	for e in segment.dovetails:
69	69	if e.from_segment == e.to_segment:

71	71
72	72	def remove_self_links(self):
73	73	"""Remove all dovetail overlap of segments to themselves, if any."""
74		for sn in segment_names:
	74	for sn in self.segment_names:
75	75	self.remove_self_link(sn)

+10

-9

gfapy/graph_operations/topology.py less more

25	25	c[et] = set()
26	26	visited = set()
27	27	segend = link.get("from") if et == "from" else link.to
28		visited.append(segend.name)
29		visited.append(link.other_end(segend).name)
	28	visited.add(segend.name)
	29	visited.add(link.other_end(segend).name)
30	30	self.__traverse_component(segend, c[et], visited)
31	31	return c["from"] != c["to"]
32	32

49	49	start_points = set()
50	50	for et in ["L", "R"]:
51	51	for l in segment.dovetails_of_end(et):
52		start_points.append(l.other_end(\
53		gfapy.SegmentEnd(segment_name, et)).inverted())
	52	start_points.add(l.other_end(\
	53	gfapy.SegmentEnd(segment.name, et)).inverted())
54	54	cc = []
55	55	for start_point in start_points:
56	56	cc.append(set())
57	57	visited = set()
58		visited.append(segment_name)
59		traverse_component(start_point, cc[-1], visited)
	58	visited.add(segment.name)
	59	self.__traverse_component(start_point, cc[-1], visited)
60	60	return any(c != cc[0] for c in cc)
61	61
62	62	def segment_connected_component(self, segment, visited = None):

79	79	segment_name = segment
80	80	segment = self.segment(segment)
81	81	visited.add(segment_name)
82		c = [segment]
	82	c = set()
	83	c.add(segment)
83	84	for e in ["L", "R"]:
84	85	self.__traverse_component(gfapy.SegmentEnd(segment, e), c, visited)
85		return c
	86	return list(c)
86	87
87	88	def connected_components(self):
88	89	"""Compute the connected components of the graph.

183	184	if sn in visited:
184	185	continue
185	186	visited.add(sn)
186		c.append(s)
	187	c.add(s)
187	188	for e in ["L","R"]:
188	189	self.__traverse_component(gfapy.SegmentEnd(s, e), c, visited)

+0

-1

gfapy/lastpos.py less more

0	0	import gfapy
1		import copy
2	1	from functools import total_ordering
3	2
4	3	@total_ordering

+0

-1

gfapy/line/comment/comment.py less more

0		import gfapy
1	0	from ..line import Line
2	1	from .construction import Construction
3	2	from .tags import Tags

+0

-2

gfapy/line/comment/writer.py less more

0		import gfapy
1
2	0	class Writer:
3	1	def __str__(self):
4	2	return "#" + str(self.spacer) + str(self.content)

+0

-1

gfapy/line/common/cloning.py less more

0		import gfapy
1	0	import json
2	1	from copy import deepcopy
3	2

+1

-1

gfapy/line/common/construction.py less more

298	298	DynamicField(partial(get_method, k = k),
299	299	partial(set_method, k = k)))
300	300	def all_references(self):
301		return [ item for item in values for values in self._refs ]
	301	return [ item for item in [ values for values in self._refs ] ]
302	302
303	303	@classmethod
304	304	def register_extension(cls, references=[]):

+0

-2

gfapy/line/common/default_record_definition.py less more

0		import gfapy
1
2	0	class DefaultRecordDefinition:
3	1	"""
4	2	Provides default values for the constants for the definition of record types.

+1

-1

gfapy/line/common/dynamic_fields.py less more

30	30	return super().__setattr__(name, value)
31	31	else:
32	32	attr.set(self, value)
33		except AttributeError as err:
	33	except AttributeError:
34	34	return self._set_dynamic_field(name, value)
35	35
36	36	def _get_dynamic_field(self, name, err):

+1

-1

gfapy/line/common/validate.py less more

44	44	self._validate_predefined_tag_type(n, self._field_datatype(n))
45	45	elif not self._is_valid_custom_tagname(n):
46	46	raise gfapy.FormatError("Custom tag names must consist in a letter "+
47		"and a digit or two letters\nFound: {}".format(tagname))
	47	"and a digit or two letters\nFound: {}".format(n))
48	48
49	49	def _validate_predefined_tag_type(self, tagname, datatype):
50	50	if datatype != self.__class__.DATATYPE[tagname]:

+1

-1

gfapy/line/common/writer.py less more

158	158	(str, str, object) list
159	159	"""
160	160	retval = []
161		for of in tagnames:
	161	for of in self.tagnames:
162	162	retval.append([of, self.get_datatype(of), self.get(of)])
163	163	return retval

+2

-0

gfapy/line/edge/containment/to_gfa2.py less more

	0	import gfapy
	1
0	2	class ToGFA2:
1	3
2	4	@property

+1

-1

gfapy/line/edge/gfa2/validation.py less more

17	17	raise gfapy.InconsistencyError(
18	18	"Edge: {}\n".format(str(self))+
19	19	"Field {}: $ after ".format(fn)+
20		"non-last position\n".format(str(pos))+
	20	"non-last position {}\n".format(str(pos))+
21	21	"Segment: {}".format(str(seg)))

+3

-3

gfapy/line/edge/link/equivalence.py less more

8	8	Thereby, tags are not considered.
9	9	"""
10	10	hash(str(self.from_end)) + \
11		hash(str(to_end)) + \
12		hash(str(overlap)) + \
13		hash(str(overlap.complement()))
	11	hash(str(self.to_end)) + \
	12	hash(str(self.overlap)) + \
	13	hash(str(self.overlap.complement()))
14	14
15	15	def is_eql(self, other):
16	16	"""

+0

-1

gfapy/line/gap/gap.py less more

0		import gfapy
1	0	from .references import References
2	1	from ..line import Line
3	2

+2

-0

gfapy/line/group/gfa2/same_id.py less more

	0	import gfapy
	1
0	2	class SameID:
1	3
2	4	def _process_not_unique(self, previous):

+8

-8

gfapy/line/group/ordered/captured_path.py less more

86	86	oss = [oriented_edge.line.sid1, oriented_edge.line.sid2]
87	87	if oriented_edge.orient == "-":
88	88	for i in range(len(oss)):
89		oss[i].invert()
	89	oss[i] = oss[i].inverted()
90	90	if len(items) > 1:
91	91	nextitem = items[1]
92	92	if isinstance(nextitem.line, gfapy.line.segment.GFA2):

98	98	oss_of_next = [nextitem.line.sid1, nextitem.line.sid2]
99	99	if oriented_edge.orient == "-":
100	100	for i in range(len(oss_of_next)):
101		oss_of_next[i].invert()
	101	oss_of_next[i] = oss_of_next[i].inverted()
102	102	if oss[0] in oss_of_next:
103	103	oss.reverse()
104	104	# if oss_of_next have no element in common with oss an error will be
105	105	# raised in the next iteration, so does not need to be handled here
106	106	elif isinstance(nextitem.line, gfapy.line.group.Ordered):
107		subpath = item.line.captured_path
	107	subpath = nextitem.line.captured_path
108	108	if not subpath: return# does not need to be further handled here
109		if item.orient == "+":
110		firstsubpathsegment = supath[0]
	109	if nextitem.orient == "+":
	110	firstsubpathsegment = subpath[0]
111	111	else:
112		firstsubpathsegment = supath[-1].inverted()
	112	firstsubpathsegment = subpath[-1].inverted()
113	113	if firstsubpathsegment == oss[0]:
114	114	oss.reverse()
115	115	# if oss does not include in firstsubpathsegment

128	128	possible_prev = [oriented_edge.line.sid1, oriented_edge.line.sid2]
129	129	if oriented_edge.orient == "-":
130	130	for i, v in enumerate(possible_prev):
131		possible_prev[i].invert()
	131	possible_prev[i] = possible_prev[i].inverted()
132	132	if prev_os == possible_prev[0]:
133	133	path.append(possible_prev[1])
134	134	elif prev_os == possible_prev[1]:

166	166	"Expected element:\n"+
167	167	" {} ({})\n".format(path[-1], path[-1].line)+
168	168	"Current element:\n"+
169		" {} ({})\n".format(segment, segment.line))
	169	" {} ({})\n".format(oriented_segment, oriented_segment.line))
170	170
171	171	def _check_s_to_e_contiguity(self, path, oriented_segment):
172	172	# check that segment is an extremity of path[-1]

+5

-3

gfapy/line/group/ordered/references.py less more

	0	import gfapy
	1
0	2	class References:
1	3
2	4	def append_item(self, item):

42	44	self.items = self.items[1:]
43	45	else:
44	46	self.items[0].update_reference(self, "paths")
45		self._delete_reference(items[0], "items")
	47	self._delete_reference(self.items[0], "items")
46	48	self.compute_induced_set() # check contiguity
47	49
48	50	def rm_last_item(self):

65	67	if isinstance(item.line, gfapy.Line):
66	68	item.line = item.name
67	69	if append:
68		items.append(item)
	70	self.items.append(item)
69	71	else:
70		items.insert(0, item)
	72	self.items.insert(0, item)
71	73
72	74	def _add_item_to_connected_group(self, item, append = True):
73	75	item.line = self.prepare_and_check_ref(item.line)

+2

-0

gfapy/line/group/ordered/to_gfa1.py less more

19	19	gfapy.Field._validate_gfa_field(oline.line.overlap, "alignment_gfa1")
20	20	overlaps.append(str(oline.line.overlap))
21	21	a.append(",".join(overlaps))
	22	for tn in self.tagnames:
	23	a.append(self.field_to_s(tn, tag=True))
22	24	return a

+2

-0

gfapy/line/group/path/to_gfa2.py less more

11	11	a = ["O"]
12	12	a.append(self.field_to_s("path_name"))
13	13	a.append(" ".join(items))
	14	for tn in self.tagnames:
	15	a.append(self.field_to_s(tn, tag=True))
14	16	return a

+4

-2

gfapy/line/group/unordered/references.py less more

	0	import gfapy
	1
0	2	class References:
1	3
2	4	def add_item(self, item):

38	40	if isinstance(item, str):
39	41	item = self._gfa.line(item)
40	42	self._check_item_included(item)
41		line._delete_reference(self, "sets")
42		self._delete_reference(line, "items")
	43	item._delete_reference(self, "sets")
	44	self._delete_reference(item, "items")
43	45	return None
44	46
45	47	def _check_item_included(self, item):

+1

-3

gfapy/line/line.py less more

0	0	from .common.construction import Construction
1		from .common.dynamic_fields import DynamicFields, DynamicField
	1	from .common.dynamic_fields import DynamicFields
2	2	from .common.writer import Writer
3	3	from .common.version_conversion import VersionConversion
4	4	from .common.field_datatype import FieldDatatype

11	11	from .common.disconnection import Disconnection
12	12	from .common.validate import Validate
13	13	from .common.default_record_definition import DefaultRecordDefinition
14
15		import gfapy
16	14
17	15	class Line(Construction, DynamicFields, Writer, VersionConversion,
18	16	FieldDatatype, FieldData, Equivalence, Cloning, Connection,

+1

-1

gfapy/line/segment/gfa1_to_gfa2.py less more

16	16	"GFA2 requires to specify a length\n"+
17	17	"No length information available in the GFA1 segment:\n"+
18	18	"Segment line: {}".format(str(self)))
19		a = ["S", self.field_to_s("name", tag = False), str(self.try_get_length()),
	19	a = ["S", self.field_to_s("name", tag = False), str(length),
20	20	self.field_to_s("sequence", tag = False)]
21	21	for fn in self.tagnames:
22	22	if fn != "LN":

+0

-2

gfapy/lines/collections.py less more

0		import gfapy
1
2	0	class Collections:
3	1	@property
4	2	def comments(self):

+1

-0

gfapy/lines/creators.py less more

153	153	elif gfa_line.record_type in ["L", "P", "C", "#"]:
154	154	gfa_line.connect(self)
155	155	else:
	156	rt = gfa_line.record_type
156	157	raise gfapy.AssertionError(
157	158	"Invalid record type {}. This should never happen".format(rt))
158	159

+2

-4

gfapy/lines/destructors.py less more

21	21	other_end = gfapy.SegmentEnd(other_end)
22	22	s = self.try_get_segment(segment_end.segment)
23	23	for d in s.dovetails_of_end(segment_end.end_type):
24		if not conserve_components or not self.is_cut_link(l):
25		l.disconnect()
	24	if not conserve_components or not self.is_cut_link(d):
	25	d.disconnect()
26	26
27	27	def _unregister_line(self, gfa_line):
28	28	self._api_private_check_gfa_line(gfa_line, "unregister_line")

31	31	raise gfapy.AssertionError("Bug found, please report\n"+
32	32	"gfa_line: {}".format(gfa_line))
33	33	collection = self._records[rt]
34		key = gfa_line
35		delete_if_empty = None
36	34	storage_key = gfa_line.__class__.STORAGE_KEY
37	35	if storage_key == "name":
38	36	name = gfa_line.name

+0

-2

gfapy/lines/headers.py less more

0		import gfapy
1
2	0	class Headers:
3	1	@property
4	2	def header(self):

+0

-1

gfapy/lines/lines.py less more

1	1	import gfapy.line
2	2	from .collections import Collections
3	3	from .headers import Headers
4		from .collections import Collections
5	4	from .creators import Creators
6	5	from .destructors import Destructors
7	6	from .finders import Finders

+1

-1

gfapy/numeric_array.py less more

207	207	"Value: {1}\n"+
208	208	"Range: {2}\n"+
209	209	"Content: {3}").format(subtype, e,
210		repr(range), repr(array)))
	210	repr(range), repr(elems)))
211	211	yield e
212	212	else:
213	213	yield float(e)

+151

-0

gfapy.egg-info/PKG-INFO less more

	0	Metadata-Version: 1.1
	1	Name: gfapy
	2	Version: 1.2.0
	3	Summary: Library for handling data in the GFA1 and GFA2 formats
	4	Home-page: https://github.com/ggonnella/gfapy
	5	Author: Giorgio Gonnella and others (see CONTRIBUTORS)
	6	Author-email: gonnella@zbh.uni-hamburg.de
	7	License: ISC
	8	Description: Gfapy
	9	~~~~~
	10
	11	\|travis\| \|readthedocs\| \|latesttag\| \|license\| \|requiresio\|
	12
	13	\|bioconda\| \|pypi\| \|debian\| \|ubuntu\|
	14
	15	.. sphinx-begin
	16
	17	The Graphical Fragment Assembly (GFA) are formats for the representation
	18	of sequence graphs, including assembly, variation and splicing graphs.
	19	Two versions of GFA have been defined (GFA1 and GFA2) and several sequence
	20	analysis programs have been adopting the formats as an interchange format,
	21	which allow to easily combine different sequence analysis tools.
	22
	23	This library implements the GFA1 and GFA2 specification
	24	described at https://github.com/GFA-spec/GFA-spec/blob/master/GFA-spec.md.
	25	It allows to create a Gfa object from a file in the GFA format
	26	or from scratch, to enumerate the graph elements (segments, links,
	27	containments, paths and header lines), to traverse the graph (by
	28	traversing all links outgoing from or incoming to a segment), to search for
	29	elements (e.g. which links connect two segments) and to manipulate the
	30	graph (e.g. to eliminate a link or a segment or to duplicate a segment
	31	distributing the read counts evenly on the copies).
	32
	33	The GFA format can be easily extended by users by defining own custom
	34	tags and record types. In Gfapy, it is easy to write extensions modules,
	35	which allow to define custom record types and datatypes for the parsing
	36	and validation of custom fields. The custom lines can be connected, using
	37	references, to each other and to lines of the standard record types.
	38
	39	Requirements
	40	~~~~~~~~~~~~
	41
	42	Gfapy has been written for Python 3 and tested using Python version 3.7.
	43	It does not require any additional Python packages or other software.
	44
	45	Installation
	46	~~~~~~~~~~~~
	47
	48	Gfapy is distributed as a Python package and can be installed using
	49	the Python package manager pip, as well as conda (in the Bioconda channel).
	50	It is also available as a package in some Linux distributions (Debian, Ubuntu).
	51
	52	The following command installs the current stable version from the Python
	53	Packages index::
	54
	55	pip install gfapy
	56
	57	If you would like to install the current development version from Github,
	58	use the following command::
	59
	60	pip install -e git+https://github.com/ggonnella/gfapy.git#egg=gfapy
	61
	62	Alternatively it is possible to install gfapy using conda. Gfapy is
	63	included in the Bioconda (https://bioconda.github.io/) channel::
	64
	65	conda install -c bioconda gfapy
	66
	67	Usage
	68	~~~~~
	69
	70	If you installed gfapy as described above, you can import it in your script
	71	using the conventional Python syntax::
	72
	73	>>> import gfapy
	74
	75	Documentation
	76	~~~~~~~~~~~~~
	77
	78	The documentation, including this introduction to Gfapy, a user manual
	79	and the API documentation is hosted on the ReadTheDocs server,
	80	at the URL http://gfapy.readthedocs.io/en/latest/ and it can be
	81	downloaded as PDF from the URL
	82	https://github.com/ggonnella/gfapy/blob/master/manual/gfapy-manual.pdf.
	83
	84	References
	85	~~~~~~~~~~
	86
	87	Giorgio Gonnella and Stefan Kurtz "GfaPy: a flexible and extensible software
	88	library for handling sequence graphs in Python", Bioinformatics (2017) btx398
	89	https://doi.org/10.1093/bioinformatics/btx398
	90
	91	.. sphinx-end
	92
	93	.. \|travis\|
	94	image:: https://travis-ci.com/ggonnella/gfapy.svg?branch=master
	95	:target: https://travis-ci.com/ggonnella/gfapy
	96	:alt: Travis
	97
	98	.. \|latesttag\|
	99	image:: https://img.shields.io/github/v/tag/ggonnella/gfapy
	100	:target: https://github.com/ggonnella/gfapy/tags
	101	:alt: Latest GitHub tag
	102
	103	.. \|readthedocs\|
	104	image:: https://readthedocs.org/projects/pip/badge/?version=stable
	105	:target: https://pip.pypa.io/en/stable/?badge=stable
	106	:alt: ReadTheDocs
	107
	108	.. \|bioconda\|
	109	image:: https://img.shields.io/conda/vn/bioconda/gfapy
	110	:target: https://bioconda.github.io/recipes/gfapy/README.html
	111	:alt: Bioconda
	112
	113	.. \|pypi\|
	114	image:: https://img.shields.io/pypi/v/gfapy
	115	:target: https://pypi.org/project/gfapy/
	116	:alt: PyPI
	117
	118	.. \|debian\|
	119	image:: https://img.shields.io/debian/v/gfapy
	120	:target: https://packages.debian.org/search?keywords=gfapy
	121	:alt: Debian
	122
	123	.. \|ubuntu\|
	124	image:: https://img.shields.io/ubuntu/v/gfapy
	125	:target: https://packages.ubuntu.com/search?keywords=gfapy
	126	:alt: Ubuntu
	127
	128	.. \|license\|
	129	image:: https://img.shields.io/pypi/l/gfapy
	130	:target: https://github.com/ggonnella/gfapy/blob/master/LICENSE.txt
	131	:alt: ISC License
	132
	133	.. \|requiresio\|
	134	image:: https://requires.io/github/ggonnella/gfapy/requirements.svg?branch=master
	135	:target: https://requires.io/github/ggonnella/gfapy/requirements/?branch=master
	136	:alt: Requirements Status
	137
	138	Keywords: bioinformatics genomics sequences GFA assembly graphs
	139	Platform: UNKNOWN
	140	Classifier: Development Status :: 5 - Production/Stable
	141	Classifier: Environment :: Console
	142	Classifier: Intended Audience :: Developers
	143	Classifier: Intended Audience :: End Users/Desktop
	144	Classifier: Intended Audience :: Science/Research
	145	Classifier: License :: OSI Approved :: ISC License (ISCL)
	146	Classifier: Operating System :: MacOS :: MacOS X
	147	Classifier: Operating System :: POSIX :: Linux
	148	Classifier: Programming Language :: Python :: 3 :: Only
	149	Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
	150	Classifier: Topic :: Software Development :: Libraries

+335

-0

gfapy.egg-info/SOURCES.txt less more

	0	MANIFEST.in
	1	README.rst
	2	setup.cfg
	3	setup.py
	4	bin/gfapy-convert
	5	bin/gfapy-mergelinear
	6	bin/gfapy-renumber
	7	bin/gfapy-validate
	8	gfapy/__init__.py
	9	gfapy/byte_array.py
	10	gfapy/error.py
	11	gfapy/field_array.py
	12	gfapy/gfa.py
	13	gfapy/lastpos.py
	14	gfapy/logger.py
	15	gfapy/numeric_array.py
	16	gfapy/oriented_line.py
	17	gfapy/placeholder.py
	18	gfapy/rgfa.py
	19	gfapy/segment_end.py
	20	gfapy/segment_end_path.py
	21	gfapy/sequence.py
	22	gfapy/symbol_invert.py
	23	gfapy.egg-info/PKG-INFO
	24	gfapy.egg-info/SOURCES.txt
	25	gfapy.egg-info/dependency_links.txt
	26	gfapy.egg-info/not-zip-safe
	27	gfapy.egg-info/top_level.txt
	28	gfapy/alignment/__init__.py
	29	gfapy/alignment/alignment.py
	30	gfapy/alignment/cigar.py
	31	gfapy/alignment/placeholder.py
	32	gfapy/alignment/trace.py
	33	gfapy/field/__init__.py
	34	gfapy/field/alignment_gfa1.py
	35	gfapy/field/alignment_gfa2.py
	36	gfapy/field/alignment_list_gfa1.py
	37	gfapy/field/byte_array.py
	38	gfapy/field/char.py
	39	gfapy/field/comment.py
	40	gfapy/field/custom_record_type.py
	41	gfapy/field/field.py
	42	gfapy/field/float.py
	43	gfapy/field/generic.py
	44	gfapy/field/identifier_gfa2.py
	45	gfapy/field/identifier_list_gfa2.py
	46	gfapy/field/integer.py
	47	gfapy/field/json.py
	48	gfapy/field/numeric_array.py
	49	gfapy/field/optional_identifier_gfa2.py
	50	gfapy/field/optional_integer.py
	51	gfapy/field/orientation.py
	52	gfapy/field/oriented_identifier_gfa2.py
	53	gfapy/field/oriented_identifier_list_gfa1.py
	54	gfapy/field/oriented_identifier_list_gfa2.py
	55	gfapy/field/parser.py
	56	gfapy/field/path_name_gfa1.py
	57	gfapy/field/position_gfa1.py
	58	gfapy/field/position_gfa2.py
	59	gfapy/field/segment_name_gfa1.py
	60	gfapy/field/sequence_gfa1.py
	61	gfapy/field/sequence_gfa2.py
	62	gfapy/field/string.py
	63	gfapy/field/validator.py
	64	gfapy/field/writer.py
	65	gfapy/graph_operations/__init__.py
	66	gfapy/graph_operations/artifacts.py
	67	gfapy/graph_operations/copy_number.py
	68	gfapy/graph_operations/graph_operations.py
	69	gfapy/graph_operations/invertible_segments.py
	70	gfapy/graph_operations/linear_paths.py
	71	gfapy/graph_operations/multiplication.py
	72	gfapy/graph_operations/p_bubbles.py
	73	gfapy/graph_operations/redundant_linear_paths.py
	74	gfapy/graph_operations/superfluous_links.py
	75	gfapy/graph_operations/topology.py
	76	gfapy/line/__init__.py
	77	gfapy/line/line.py
	78	gfapy/line/comment/__init__.py
	79	gfapy/line/comment/comment.py
	80	gfapy/line/comment/construction.py
	81	gfapy/line/comment/tags.py
	82	gfapy/line/comment/version_conversion.py
	83	gfapy/line/comment/writer.py
	84	gfapy/line/common/__init__.py
	85	gfapy/line/common/cloning.py
	86	gfapy/line/common/connection.py
	87	gfapy/line/common/construction.py
	88	gfapy/line/common/default_record_definition.py
	89	gfapy/line/common/disconnection.py
	90	gfapy/line/common/dynamic_fields.py
	91	gfapy/line/common/equivalence.py
	92	gfapy/line/common/field_data.py
	93	gfapy/line/common/field_datatype.py
	94	gfapy/line/common/update_references.py
	95	gfapy/line/common/validate.py
	96	gfapy/line/common/version_conversion.py
	97	gfapy/line/common/virtual_to_real.py
	98	gfapy/line/common/writer.py
	99	gfapy/line/custom_record/__init__.py
	100	gfapy/line/custom_record/construction.py
	101	gfapy/line/custom_record/custom_record.py
	102	gfapy/line/edge/__init__.py
	103	gfapy/line/edge/edge.py
	104	gfapy/line/edge/common/__init__.py
	105	gfapy/line/edge/common/alignment_type.py
	106	gfapy/line/edge/common/from_to.py
	107	gfapy/line/edge/containment/__init__.py
	108	gfapy/line/edge/containment/canonical.py
	109	gfapy/line/edge/containment/containment.py
	110	gfapy/line/edge/containment/pos.py
	111	gfapy/line/edge/containment/to_gfa2.py
	112	gfapy/line/edge/gfa1/__init__.py
	113	gfapy/line/edge/gfa1/alignment_type.py
	114	gfapy/line/edge/gfa1/oriented_segments.py
	115	gfapy/line/edge/gfa1/other.py
	116	gfapy/line/edge/gfa1/references.py
	117	gfapy/line/edge/gfa1/to_gfa2.py
	118	gfapy/line/edge/gfa2/__init__.py
	119	gfapy/line/edge/gfa2/alignment_type.py
	120	gfapy/line/edge/gfa2/gfa2.py
	121	gfapy/line/edge/gfa2/other.py
	122	gfapy/line/edge/gfa2/references.py
	123	gfapy/line/edge/gfa2/to_gfa1.py
	124	gfapy/line/edge/gfa2/validation.py
	125	gfapy/line/edge/link/__init__.py
	126	gfapy/line/edge/link/canonical.py
	127	gfapy/line/edge/link/complement.py
	128	gfapy/line/edge/link/equivalence.py
	129	gfapy/line/edge/link/link.py
	130	gfapy/line/edge/link/references.py
	131	gfapy/line/edge/link/to_gfa2.py
	132	gfapy/line/fragment/__init__.py
	133	gfapy/line/fragment/fragment.py
	134	gfapy/line/fragment/references.py
	135	gfapy/line/fragment/validation.py
	136	gfapy/line/gap/__init__.py
	137	gfapy/line/gap/gap.py
	138	gfapy/line/gap/references.py
	139	gfapy/line/group/__init__.py
	140	gfapy/line/group/group.py
	141	gfapy/line/group/gfa2/__init__.py
	142	gfapy/line/group/gfa2/references.py
	143	gfapy/line/group/gfa2/same_id.py
	144	gfapy/line/group/ordered/__init__.py
	145	gfapy/line/group/ordered/captured_path.py
	146	gfapy/line/group/ordered/ordered.py
	147	gfapy/line/group/ordered/references.py
	148	gfapy/line/group/ordered/to_gfa1.py
	149	gfapy/line/group/path/__init__.py
	150	gfapy/line/group/path/captured_path.py
	151	gfapy/line/group/path/path.py
	152	gfapy/line/group/path/references.py
	153	gfapy/line/group/path/to_gfa2.py
	154	gfapy/line/group/path/topology.py
	155	gfapy/line/group/path/validation.py
	156	gfapy/line/group/unordered/__init__.py
	157	gfapy/line/group/unordered/induced_set.py
	158	gfapy/line/group/unordered/references.py
	159	gfapy/line/group/unordered/unordered.py
	160	gfapy/line/header/__init__.py
	161	gfapy/line/header/connection.py
	162	gfapy/line/header/field_data.py
	163	gfapy/line/header/header.py
	164	gfapy/line/header/multiline.py
	165	gfapy/line/header/version_conversion.py
	166	gfapy/line/segment/__init__.py
	167	gfapy/line/segment/coverage.py
	168	gfapy/line/segment/gfa1.py
	169	gfapy/line/segment/gfa1_to_gfa2.py
	170	gfapy/line/segment/gfa2.py
	171	gfapy/line/segment/gfa2_to_gfa1.py
	172	gfapy/line/segment/length_gfa1.py
	173	gfapy/line/segment/references.py
	174	gfapy/line/segment/segment.py
	175	gfapy/line/segment/writer_wo_sequence.py
	176	gfapy/line/unknown/__init__.py
	177	gfapy/line/unknown/unknown.py
	178	gfapy/lines/__init__.py
	179	gfapy/lines/collections.py
	180	gfapy/lines/creators.py
	181	gfapy/lines/destructors.py
	182	gfapy/lines/finders.py
	183	gfapy/lines/headers.py
	184	gfapy/lines/lines.py
	185	manual/gfapy-manual.pdf
	186	tests/__init__.py
	187	tests/extension.py
	188	tests/test_api_alignment.py
	189	tests/test_api_comments.py
	190	tests/test_api_custom_records.py
	191	tests/test_api_extensions.py
	192	tests/test_api_gfa1_lines.py
	193	tests/test_api_gfa2_lines.py
	194	tests/test_api_gfa_basics.py
	195	tests/test_api_groups_validation.py
	196	tests/test_api_header.py
	197	tests/test_api_linear_paths.py
	198	tests/test_api_linear_paths_extended.py
	199	tests/test_api_lines_collections.py
	200	tests/test_api_lines_creators.py
	201	tests/test_api_lines_destructors.py
	202	tests/test_api_lines_finders.py
	203	tests/test_api_multiplication.py
	204	tests/test_api_placeholders.py
	205	tests/test_api_positionals.py
	206	tests/test_api_positions.py
	207	tests/test_api_references_edge_gfa1.py
	208	tests/test_api_references_edge_gfa2.py
	209	tests/test_api_references_f_g_lines.py
	210	tests/test_api_references_groups.py
	211	tests/test_api_references_virtual.py
	212	tests/test_api_rename_lines.py
	213	tests/test_api_rgfa.py
	214	tests/test_api_tags.py
	215	tests/test_api_version.py
	216	tests/test_api_version_conversion.py
	217	tests/test_gfapy_alignment.py
	218	tests/test_gfapy_byte_array.py
	219	tests/test_gfapy_cigar.py
	220	tests/test_gfapy_line_containment.py
	221	tests/test_gfapy_line_edge.py
	222	tests/test_gfapy_line_header.py
	223	tests/test_gfapy_line_link.py
	224	tests/test_gfapy_line_path.py
	225	tests/test_gfapy_line_segment.py
	226	tests/test_gfapy_line_version.py
	227	tests/test_gfapy_numeric_array.py
	228	tests/test_gfapy_segment_references.py
	229	tests/test_gfapy_sequence.py
	230	tests/test_gfapy_trace.py
	231	tests/test_graphop_artifacts.py
	232	tests/test_graphop_copy_number.py
	233	tests/test_internals_field_parser.py
	234	tests/test_internals_field_validator.py
	235	tests/test_internals_field_writer.py
	236	tests/test_internals_tag_datatype.py
	237	tests/test_unit_alignment.py
	238	tests/test_unit_field_array.py
	239	tests/test_unit_gfa_lines.py
	240	tests/test_unit_header.py
	241	tests/test_unit_line.py
	242	tests/test_unit_line_cloning.py
	243	tests/test_unit_line_connection.py
	244	tests/test_unit_line_dynamic_fields.py
	245	tests/test_unit_line_equivalence.py
	246	tests/test_unit_lines_finders.py
	247	tests/test_unit_multiplication.py
	248	tests/test_unit_numeric_array.py
	249	tests/test_unit_oriented_line.py
	250	tests/test_unit_segment_end.py
	251	tests/test_unit_symbol_invert.py
	252	tests/test_unit_unknown.py
	253	tests/testdata/all_line_types.gfa1.gfa
	254	tests/testdata/all_line_types.gfa2.gfa
	255	tests/testdata/copynum.1.gfa
	256	tests/testdata/copynum.1.gfa2
	257	tests/testdata/copynum.2.gfa
	258	tests/testdata/copynum.2.gfa2
	259	tests/testdata/dead_ends.gfa
	260	tests/testdata/dead_ends.gfa2
	261	tests/testdata/example1.gfa
	262	tests/testdata/example1.gfa2
	263	tests/testdata/example_from_spec.gfa
	264	tests/testdata/example_from_spec.gfa2
	265	tests/testdata/example_from_spec.path14.seq
	266	tests/testdata/example_from_spec2.gfa
	267	tests/testdata/example_from_spec2.gfa2
	268	tests/testdata/filled.gfa1
	269	tests/testdata/filled.gfa2
	270	tests/testdata/gfa2_edges_classification.gfa
	271	tests/testdata/invalid_path.gfa2
	272	tests/testdata/linear_blunt.gfa1
	273	tests/testdata/linear_blunt.gfa2
	274	tests/testdata/linear_merging.1.gfa
	275	tests/testdata/linear_merging.1.gfa2
	276	tests/testdata/linear_merging.2.gfa
	277	tests/testdata/linear_merging.2.gfa2
	278	tests/testdata/linear_merging.3.gfa
	279	tests/testdata/linear_merging.3.gfa2
	280	tests/testdata/linear_merging.4.gfa
	281	tests/testdata/linear_merging.4.gfa2
	282	tests/testdata/linear_merging.5.gfa
	283	tests/testdata/linear_merging.5.gfa2
	284	tests/testdata/links_distri.l1.gfa
	285	tests/testdata/links_distri.l1.gfa2
	286	tests/testdata/links_distri.l1.m2.gfa
	287	tests/testdata/links_distri.l1.m2.gfa2
	288	tests/testdata/links_distri.l2.gfa
	289	tests/testdata/links_distri.l2.gfa2
	290	tests/testdata/links_distri.l2.m2.gfa
	291	tests/testdata/links_distri.l2.m2.gfa2
	292	tests/testdata/links_distri.l2.m2.no_ld.gfa
	293	tests/testdata/links_distri.l2.m2.no_ld.gfa2
	294	tests/testdata/links_distri.l2.m3.gfa
	295	tests/testdata/links_distri.l2.m3.gfa2
	296	tests/testdata/links_distri.l2.m3.no_ld.gfa
	297	tests/testdata/links_distri.l2.m3.no_ld.gfa2
	298	tests/testdata/links_distri.l3.gfa
	299	tests/testdata/links_distri.l3.gfa2
	300	tests/testdata/links_distri.l3.m2.gfa
	301	tests/testdata/links_distri.l3.m2.gfa2
	302	tests/testdata/links_distri.l3.m2.no_ld.gfa
	303	tests/testdata/links_distri.l3.m2.no_ld.gfa2
	304	tests/testdata/loop.gfa
	305	tests/testdata/loop.gfa2
	306	tests/testdata/rgfa_example.1.gfa
	307	tests/testdata/rgfa_example.2.gfa
	308	tests/testdata/sample.gfa
	309	tests/testdata/sample.gfa2
	310	tests/testdata/seq_to_fill.fas
	311	tests/testdata/spec_q1.gfa
	312	tests/testdata/spec_q1.gfa2
	313	tests/testdata/spec_q2.gfa
	314	tests/testdata/spec_q2.gfa2
	315	tests/testdata/spec_q2.path_circular.seq
	316	tests/testdata/spec_q2.path_linear.seq
	317	tests/testdata/spec_q3.gfa
	318	tests/testdata/spec_q3.gfa2
	319	tests/testdata/spec_q4.gfa
	320	tests/testdata/spec_q4.gfa2
	321	tests/testdata/spec_q4.path_more_than_circular.seq
	322	tests/testdata/spec_q5.gfa
	323	tests/testdata/spec_q5.gfa2
	324	tests/testdata/spec_q6.gfa
	325	tests/testdata/spec_q6.gfa2
	326	tests/testdata/spec_q7.gfa
	327	tests/testdata/spec_q7.gfa2
	328	tests/testdata/to_be_filled.gfa1
	329	tests/testdata/to_be_filled.gfa2
	330	tests/testdata/two_components.gfa
	331	tests/testdata/two_components.gfa2
	332	tests/testdata/unnamed_and_named_links.gfa
	333	tests/testdata/unnamed_link.gfa
	334	tests/testdata/valid_path.gfa2⏎

+1

-0

gfapy.egg-info/dependency_links.txt less more

0

+1

-0

gfapy.egg-info/not-zip-safe less more

0

+2

-0

gfapy.egg-info/top_level.txt less more

	0	gfapy
	1	tests

manual/gfapy-manual.pdf less more

Binary diff not shown

+5

-0

setup.cfg less more

0	0	[bdist_wheel]
1	1	python-tag = py3
	2
	3	[egg_info]
	4	tag_build =
	5	tag_date = 0
	6

+1

-1

setup.py less more

8	8	sys.exit("Sorry, only Python 3 is supported")
9	9
10	10	setup(name='gfapy',
11		version='1.1.0',
	11	version='1.2.0',
12	12	description='Library for handling data in the GFA1 and GFA2 formats',
13	13	long_description=readme(),
14	14	url='https://github.com/ggonnella/gfapy',

+17

-0

tests/test_api_groups_validation.py less more

	0	import gfapy
	1	import unittest
	2
	3	class TestAPIGroupsValidation(unittest.TestCase):
	4
	5	def test_invalid_path_gfa2(self):
	6	with self.assertRaises(gfapy.NotFoundError):
	7	g = gfapy.Gfa.from_file("tests/testdata/invalid_path.gfa2")
	8
	9	def test_invalid_path_gfa2_vlevel0(self):
	10	g = gfapy.Gfa.from_file("tests/testdata/invalid_path.gfa2", vlevel = 0)
	11	with self.assertRaises(gfapy.NotFoundError):
	12	g.validate()
	13
	14	def test_valid_path_gfa2(self):
	15	# nothing raised
	16	g = gfapy.Gfa.from_file("tests/testdata/valid_path.gfa2")

+7

-0

tests/test_api_linear_paths.py less more

46	46	lps.add(" ".join([s.name for s in lp]))
47	47	self.assertEqual({"1 19 18", "11 9 12", "22 16 20 21 23"}, lps)
48	48
	49	def test_linear_path_blunt_ends(self):
	50	for sfx in ["gfa1", "gfa2"]:
	51	gfa = gfapy.Gfa.from_file("tests/testdata/linear_blunt."+"{}".format(sfx))
	52	gfa.merge_linear_paths()
	53	self.assertEqual(1, len(gfa.segments))
	54	self.assertEqual("s1_s2_s3_s4", gfa.segments[0].name)
	55	self.assertEqual("CTGAAACGTGGCTCACA", gfa.segments[0].sequence)

+4

-4

tests/test_api_version_conversion.py less more

199	199	gfa1_str ='''# comment
200	200	H\tVN:Z:1.0
201	201	S\tB\t*\tLN:i:200
202		S\tC\t*\tLN:i:100
	202	S\tC\t*\tLN:i:100\txy:i:1
203	203	S\tA\t*\tLN:i:200
204	204	L\tA\t+\tB\t-\t100M\tID:Z:a_to_b
205	205	C\tA\t+\tC\t-\t20\t100M\tID:Z:2
206		P\t1\tA+,B-\t100M'''
	206	P\t1\tA+,B-\t100M\txx:Z:valid\txy:i:1'''
207	207	gfa2_str ='''# comment
208	208	H\tVN:Z:2.0
209	209	S\tB\t200\t*
210		S\tC\t100\t*
	210	S\tC\t100\t*\txy:i:1
211	211	S\tA\t200\t*
212	212	E\ta_to_b\tA+\tB-\t100\t200$\t100\t200$\t100M
213	213	E\t2\tA+\tC-\t20\t120\t0\t100$\t100M
214		O\t1\tA+ a_to_b+ B-'''
	214	O\t1\tA+ a_to_b+ B-\txx:Z:valid\txy:i:1'''
215	215	self.assertEqual(gfa2_str, str(gfapy.Gfa(gfa1_str).to_gfa2()))
216	216	self.assertEqual(gfa1_str, str(gfapy.Gfa(gfa2_str).to_gfa1()))

+10

-0

tests/testdata/filled.gfa1 less more

	0	H VN:Z:1.0
	1	S s1 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTTCCG LN:i:44
	2	S s2 ACGTTCGTACGTACGTACGTACGTAAACGGTAC LN:i:33
	3	S s3 ACGTAAACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTTTTTACGTACGTA LN:i:75
	4	S s4 ACGTACGTACGTTTTTTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGGAACGTAGTACGTACGTACGTACG LN:i:80
	5	S s5 * LN:i:90 xx:Z:not in the Fasta, should accept and warn
	6	L s1 + s2 + 5M1I2M
	7	L s2 + s3 + 8M1D3M
	8	L s3 + s4 + 16M1I9M
	9	L s4 - s1 + 8M1I16M

+10

-0

tests/testdata/filled.gfa2 less more

	0	H VN:Z:2.0
	1	S s1 44 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTTCCG
	2	S s2 33 ACGTTCGTACGTACGTACGTACGTAAACGGTAC
	3	S s3 75 ACGTAAACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTTTTTACGTACGTA
	4	S s4 80 ACGTACGTACGTTTTTTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGGAACGTAGTACGTACGTACGTACG
	5	S s5 90 * xx:Z:not in the Fasta, should accept and warn
	6	E 1 s1+ s2+ 37 44$ 0 8 5M1I2M
	7	E 2 s2+ s3+ 21 33$ 0 11 8M1D3M
	8	E 3 s3+ s4+ 50 75$ 0 26 16M1I9M
	9	E 4 s4- s1+ 0 24 0 25 8M1I16M

+0

-32

~~tests/testdata/invalid/edge_missing.gfa2~~ less more

0		# File used for the collections test
1		# similar but NOT equivalent to the gfa1 file!
2		S 1 122 *
3		S 3 29 TGCTAGCTGACTGTCGATGCTGTGTG
4		E 1_to_2 1+ 2+ 110 122$ 0 12 12M
5		S 5 130 *
6		S 13 150 *
7		E 2_to_6 2+ 6+ 0 122$ 10 132 122M
8		O 14 11+ 12+
9		S 11 140 * xx:i:11
10		F 2 read1+ 0 42 12 55 * id:Z:read1_in_2
11		F 2 read2+ 45 62 0 18 * id:Z:read2_in_2
12		U 16 1 3 15 2_to_6 16sub
13		H ac:Z:test2
14		# another comment
15		S 12 150 *
16		S 4 120 *
17		H VN:Z:2.0
18		E 1_to_3 1+ 3+ 112 122$ 0 12 10M
19		G 1_to_11 1+ 11- 120 *
20		E 11_to_12 11+ 12+ 18 140$ 0 122 122M
21		S 6 150 *
22		X custom_record xx:Z:testtag
23		X custom_record X2
24		G 2_to_12 2- 12+ 500 50
25		O 15 11+ 11_to_13+ 13+ xx:i:-1
26		Y another_custom_record
27		U 16sub 2 3
28		S 2 120 * xx:Z:sometag
29		H aa:i:12 ab:Z:test1
30		H aa:i:15
31		E 1_to_5 1+ 5+ 0 122$ 2 124 * zz:Z:tag

+0

-12

~~tests/testdata/invalid/edge_wrong_lastpos.gfa2~~ less more

0		H VN:Z:2.0
1		H ul:Z:https://github.com/sjackman/assembly-graph/blob/master/sample.gfa
2		S 1 8 CGATGCAA
3		S 2 10 TGCAAAGTAC
4		S 3 21 TGCAACGTATAGACTTGTCAC RC:i:4
5		S 4 7 GCATATA
6		S 5 8 CGATGATA
7		S 6 4 ATGA
8		E * 1+ 2+ 3 9$ 0 5 5M
9		E * 3+ 2+ 21$ 21$ 0 0 0M
10		E * 3+ 4- 17 21$ 3 7$ 1M1D2M
11		E * 4- 5+ 0 0 0 0 0M

+0

-33

~~tests/testdata/invalid/fragment_wrong_lastpos.gfa2~~ less more

0		# File used for the collections test
1		# similar but NOT equivalent to the gfa1 file!
2		S 1 122 *
3		S 3 29 TGCTAGCTGACTGTCGATGCTGTGTG
4		E 1_to_2 1+ 2+ 110 122$ 0 12 12M
5		S 5 130 *
6		S 13 150 *
7		E 2_to_6 2+ 6+ 0 122$ 10 132 122M
8		O 14 11+ 12+
9		S 11 140 * xx:i:11
10		F 3 read1+ 0 42$ 12 55 * id:Z:read1_in_3
11		F 2 read2+ 45 62 0 18 * id:Z:read2_in_2
12		U 16 1 3 15 2_to_6 16sub
13		H ac:Z:test2
14		# another comment
15		S 12 150 *
16		S 4 120 *
17		H VN:Z:2.0
18		E 1_to_3 1+ 3+ 112 122$ 0 12 10M
19		G 1_to_11 1+ 11- 120 *
20		E 11_to_12 11+ 12+ 18 140$ 0 122 122M
21		S 6 150 *
22		X custom_record xx:Z:testtag
23		X custom_record X2
24		E 11_to_13 11+ 13+ 20 140$ 0 120 120M
25		G 2_to_12 2- 12+ 500 50
26		O 15 11+ 11_to_13+ 13+ xx:i:-1
27		Y another_custom_record
28		U 16sub 2 3
29		S 2 120 * xx:Z:sometag
30		H aa:i:12 ab:Z:test1
31		H aa:i:15
32		E 1_to_5 1+ 5+ 0 122$ 2 124 * zz:Z:tag

+0

-12

~~tests/testdata/invalid/inconsistent_length.gfa1~~ less more

0		H VN:Z:1.0
1		H ul:Z:https://github.com/sjackman/assembly-graph/blob/master/sample.gfa
2		S 1 CGATGCAA LN:i:12
3		S 2 TGCAAAGTAC
4		S 3 TGCAACGTATAGACTTGTCAC RC:i:4
5		S 4 GCATATA
6		S 5 CGATGATA
7		S 6 ATGA
8		L 1 + 2 + 5M
9		L 3 + 2 + 0M
10		L 3 + 4 - 1M1D2M1S
11		L 4 - 5 + 0M

+0

-21

~~tests/testdata/invalid/link_missing.gfa1~~ less more

0		# File used for the collections test
1		S 1 *
2		S 3 CGATGCTAGCTGACTGTCGATGCTGTGTG
3		L 1 + 2 + 12M ID:Z:1_to_2
4		S 5 *
5		S 13 *
6		C 2 + 6 + 10 122M ID:Z:2_to_6
7		P 14 11+,12+ 122M
8		S 11 *
9		H ac:Z:test2
10		S 12 *
11		S 4 *
12		H VN:Z:1.0
13		L 1 + 3 + 12M ID:Z:1_to_3
14		S 6 *
15		L 11 + 13 + 120M ID:Z:11_to_13
16		P 15 11+,13+ 120M
17		S 2 * xx:Z:sometag
18		H aa:i:12 ab:Z:test1
19		H aa:i:15
20		C 1 + 5 + 12 120M ID:Z:1_to_5

+0

-21

~~tests/testdata/invalid/segment_missing.gfa1~~ less more

0		# comment
1		S 3 CGATGCTAGCTGACTGTCGATGCTGTGTG
2		L 1 + 2 + 12M ID:Z:1_to_2
3		S 5 *
4		S 13 *
5		C 2 + 6 + 10 122M ID:Z:2_to_6
6		P 14 11+,12+ 122M
7		S 11 *
8		H ac:Z:test2
9		S 12 *
10		S 4 *
11		H VN:Z:1.0
12		L 1 + 3 + 12M ID:Z:1_to_3
13		L 11 + 12 + 122M ID:Z:11_to_12
14		S 6 *
15		L 11 + 13 + 120M ID:Z:11_to_13
16		P 15 11+,13+ 120M
17		S 2 * xx:Z:sometag
18		H aa:i:12 ab:Z:test1
19		H aa:i:15
20		C 1 + 5 + 12 120M ID:Z:1_to_5

+0

-32

~~tests/testdata/invalid/segment_missing.gfa2~~ less more

0		# File used for the collections test
1		# similar but NOT equivalent to the gfa1 file!
2		S 3 29 TGCTAGCTGACTGTCGATGCTGTGTG
3		E 1_to_2 1+ 2+ 110 122$ 0 12 12M
4		S 5 130 *
5		S 13 150 *
6		E 2_to_6 2+ 6+ 0 122$ 10 132 122M
7		O 14 11+ 12+
8		S 11 140 * xx:i:11
9		F 2 read1+ 0 42 12 55 * id:Z:read1_in_2
10		F 2 read2+ 45 62 0 18 * id:Z:read2_in_2
11		U 16 1 3 15 2_to_6 16sub
12		H ac:Z:test2
13		# another comment
14		S 12 150 *
15		S 4 120 *
16		H VN:Z:2.0
17		E 1_to_3 1+ 3+ 112 122$ 0 12 10M
18		G 1_to_11 1+ 11- 120 *
19		E 11_to_12 11+ 12+ 18 140$ 0 122 122M
20		S 6 150 *
21		X custom_record xx:Z:testtag
22		X custom_record X2
23		E 11_to_13 11+ 13+ 20 140$ 0 120 120M
24		G 2_to_12 2- 12+ 500 50
25		O 15 11+ 11_to_13+ 13+ xx:i:-1
26		Y another_custom_record
27		U 16sub 2 3
28		S 2 120 * xx:Z:sometag
29		H aa:i:12 ab:Z:test1
30		H aa:i:15
31		E 1_to_5 1+ 5+ 0 122$ 2 124 * zz:Z:tag

+6

-0

tests/testdata/invalid_path.gfa2 less more

	0	H VN:Z:2.0
	1	S 1 6 AGCGTA
	2	S 2 6 TAACAG
	3	S 3 6 GCTAGT
	4	S 4 6 TCAGCG
	5	O P1 1+ 2+ 3+ 5+

+7

-0

tests/testdata/linear_blunt.gfa1 less more

	0	S s1 CTGAA
	1	S s2 ACG
	2	S s3 TGGC
	3	S s4 TGTGA
	4	L s1 + s2 + 0M
	5	L s2 + s3 + 0M
	6	L s3 + s4 - 0M

+7

-0

tests/testdata/linear_blunt.gfa2 less more

	0	S s1 5 CTGAA
	1	S s2 3 ACG
	2	S s3 4 TGGC
	3	S s4 5 TGTGA
	4	E 1 s1+ s2+ 5$ 5$ 0 0 0M
	5	E 2 s2+ s3+ 3$ 3$ 0 0 0M
	6	E 3 s3+ s4- 4$ 4$ 5$ 5$ 0M

+13

-0

tests/testdata/seq_to_fill.fas less more

	0	>s0 not in GFA, should accept and warn
	1	CGACAGCGATCGAGCATGCTAGCTAGTCGTAGCTAGCTAGCTA
	2	>s1
	3	ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTTCCG
	4	>s2 more text in description line should not disturb
	5	ACGTTCGTACGTACGT
	6	ACGTACGTAAACGGTAC
	7	>s3
	8	ACGTAAACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTTTTTACGTACGTA
	9	>s4
	10	ACGTACGTACGTTTTTTACGTACGTACGTA
	11	CGTACGTACGTACGTACGTACGTACGGAAC
	12	GTAGTACGTACGTACGTACG

+10

-0

tests/testdata/to_be_filled.gfa1 less more

	0	H VN:Z:1.0
	1	S s1 * LN:i:44
	2	S s2 * LN:i:33
	3	S s3 * LN:i:75
	4	S s4 * LN:i:80
	5	S s5 * LN:i:90 xx:Z:not in the Fasta, should accept and warn
	6	L s1 + s2 + 5M1I2M
	7	L s2 + s3 + 8M1D3M
	8	L s3 + s4 + 16M1I9M
	9	L s4 - s1 + 8M1I16M

+10

-0

tests/testdata/to_be_filled.gfa2 less more

	0	H VN:Z:2.0
	1	S s1 44 *
	2	S s2 33 *
	3	S s3 75 *
	4	S s4 80 *
	5	S s5 90 * xx:Z:not in the Fasta, should accept and warn
	6	E 1 s1+ s2+ 37 44$ 0 8 5M1I2M
	7	E 2 s2+ s3+ 21 33$ 0 11 8M1D3M
	8	E 3 s3+ s4+ 50 75$ 0 26 16M1I9M
	9	E 4 s4- s1+ 0 24 0 25 8M1I16M

+6

-0

tests/testdata/valid_path.gfa2 less more

	0	H VN:Z:2.0
	1	S 1 6 AGCGTA
	2	S 2 6 TAACAG
	3	S 3 6 GCTAGT
	4	S 4 6 TCAGCG
	5	O P1 1+ 2+ 3+ 4+