Commit bc39b8e85e21c1b571fcf99d589154150014700a - gfapy

+0

-13

~~.gitignore~~ less more

0		# Compiled python modules
1		*.pyc
2
3		# Setuptools distribution folder
4		/dist/
5
6		# Python egg metadata, regenerated from source files by setuptools
7		/*.egg-info
8		/*.egg
9
10		# Wheel data
11		build
12		conda

+0

-13

~~.travis.yml~~ less more

0		language: python
1		arch:
2		- amd64
3		- ppc64le
4		python:
5		- "3.7"
6		env:
7		- PYTHONHASHSEED=0
8		install:
9		- pip install .
10		- pip install nose
11		- pip install Sphinx
12		script: "make tests"

+0

-26

~~CHANGES.txt~~ less more

0		== 1.2.3 ==
1
2		- make it possible to count input header lines correctly
3
4		== 1.2.2 ==
5
6		- remove to/from aliases for GFA1 containment/link fields
7		since `to` potentially clashes with a tag name
8
9		== 1.2.1 ==
10
11		- fixed an issue with linear path merging (issue 21)
12		- GFA1 paths can contain a single segment only (issue 22)
13
14		== 1.2.0 ==
15
16		- fixed all open issues
17
18		== 1.1.0 ==
19
20		- fix: custom tags are not necessarily lower case
21		- additional support for rGFA subset of GFA1 by setting option dialect="rgfa"
22
23		== 1.0.0 ==
24
25		- initial release

+0

-6

~~CONTRIBUTORS~~ less more

0		The following contributors helped to develop gfapy. Please drop a note to
1		gonnella@zbh.uni-hamburg.de if I left someone out or missed something.
2
3		- Tim Weber (translation of parts of the code from Ruby to Python)
4		- Stefan Kurtz (advises)
5

+0

-51

~~Makefile~~ less more

0		default: tests
1
2		.PHONY: manual tests cleanup upload conda sdist wheel install
3
4		PYTHON=python3
5		PIP=pip3
6
7		# Install using pip
8		install:
9		${PIP} install --upgrade --user --editable .
10
11		# Source distribution
12		sdist:
13		${PYTHON} setup.py sdist
14
15		# Pure Python Wheel
16		wheel:
17		${PYTHON} setup.py bdist_wheel
18
19		# Create the manual
20		manual:
21		cd doc && make latexpdf
22		mkdir -p manual
23		cp doc/_build/latex/Gfapy.pdf manual/gfapy-manual.pdf
24
25		doctest:
26		cd doc && make doctest
27
28		unittests:
29		@echo
30		@echo "Running unit test suite..."
31		@PYTHONHASHSEED=0 ${PYTHON} -m unittest discover
32
33		tests: doctest unittests
34
35		# Remove distribution files
36		cleanup:
37		rm -rf dist/ build/ gfapy.egg-info/
38
39		upload: tests cleanup sdist wheel
40		cd dist; \
41		for file in *; do \
42		twine check $$file && \
43		twine upload $$file; \
44		done
45
46		conda:
47		mkdir -p conda
48		cd conda; \
49		conda skeleton pypi gfapy; \
50		conda build gfapy

+151

-0

PKG-INFO less more

	0	Metadata-Version: 2.1
	1	Name: gfapy
	2	Version: 1.2.3
	3	Summary: Library for handling data in the GFA1 and GFA2 formats
	4	Home-page: https://github.com/ggonnella/gfapy
	5	Author: Giorgio Gonnella and others (see CONTRIBUTORS)
	6	Author-email: gonnella@zbh.uni-hamburg.de
	7	License: ISC
	8	Keywords: bioinformatics genomics sequences GFA assembly graphs
	9	Classifier: Development Status :: 5 - Production/Stable
	10	Classifier: Environment :: Console
	11	Classifier: Intended Audience :: Developers
	12	Classifier: Intended Audience :: End Users/Desktop
	13	Classifier: Intended Audience :: Science/Research
	14	Classifier: License :: OSI Approved :: ISC License (ISCL)
	15	Classifier: Operating System :: MacOS :: MacOS X
	16	Classifier: Operating System :: POSIX :: Linux
	17	Classifier: Programming Language :: Python :: 3 :: Only
	18	Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
	19	Classifier: Topic :: Software Development :: Libraries
	20	License-File: LICENSE.txt
	21
	22	Gfapy
	23	~~~~~
	24
	25	\|travis\| \|readthedocs\| \|latesttag\| \|license\|
	26
	27	\|bioconda\| \|pypi\| \|debian\| \|ubuntu\|
	28
	29	.. sphinx-begin
	30
	31	The Graphical Fragment Assembly (GFA) are formats for the representation
	32	of sequence graphs, including assembly, variation and splicing graphs.
	33	Two versions of GFA have been defined (GFA1 and GFA2) and several sequence
	34	analysis programs have been adopting the formats as an interchange format,
	35	which allow to easily combine different sequence analysis tools.
	36
	37	This library implements the GFA1 and GFA2 specification
	38	described at https://github.com/GFA-spec/GFA-spec/blob/master/GFA-spec.md.
	39	It allows to create a Gfa object from a file in the GFA format
	40	or from scratch, to enumerate the graph elements (segments, links,
	41	containments, paths and header lines), to traverse the graph (by
	42	traversing all links outgoing from or incoming to a segment), to search for
	43	elements (e.g. which links connect two segments) and to manipulate the
	44	graph (e.g. to eliminate a link or a segment or to duplicate a segment
	45	distributing the read counts evenly on the copies).
	46
	47	The GFA format can be easily extended by users by defining own custom
	48	tags and record types. In Gfapy, it is easy to write extensions modules,
	49	which allow to define custom record types and datatypes for the parsing
	50	and validation of custom fields. The custom lines can be connected, using
	51	references, to each other and to lines of the standard record types.
	52
	53	Requirements
	54	~~~~~~~~~~~~
	55
	56	Gfapy has been written for Python 3 and tested using Python version 3.7.
	57	It does not require any additional Python packages or other software.
	58
	59	Installation
	60	~~~~~~~~~~~~
	61
	62	Gfapy is distributed as a Python package and can be installed using
	63	the Python package manager pip, as well as conda (in the Bioconda channel).
	64	It is also available as a package in some Linux distributions (Debian, Ubuntu).
	65
	66	The following command installs the current stable version from the Python
	67	Packages index::
	68
	69	pip install gfapy
	70
	71	If you would like to install the current development version from Github,
	72	use the following command::
	73
	74	pip install -e git+https://github.com/ggonnella/gfapy.git#egg=gfapy
	75
	76	Alternatively it is possible to install gfapy using conda. Gfapy is
	77	included in the Bioconda (https://bioconda.github.io/) channel::
	78
	79	conda install -c bioconda gfapy
	80
	81	Usage
	82	~~~~~
	83
	84	If you installed gfapy as described above, you can import it in your script
	85	using the conventional Python syntax::
	86
	87	>>> import gfapy
	88
	89	Documentation
	90	~~~~~~~~~~~~~
	91
	92	The documentation, including this introduction to Gfapy, a user manual
	93	and the API documentation is hosted on the ReadTheDocs server,
	94	at the URL http://gfapy.readthedocs.io/en/latest/ and it can be
	95	downloaded as PDF from the URL
	96	https://github.com/ggonnella/gfapy/blob/master/manual/gfapy-manual.pdf.
	97
	98	References
	99	~~~~~~~~~~
	100
	101	Giorgio Gonnella and Stefan Kurtz "GfaPy: a flexible and extensible software
	102	library for handling sequence graphs in Python", Bioinformatics (2017) btx398
	103	https://doi.org/10.1093/bioinformatics/btx398
	104
	105	.. sphinx-end
	106
	107	.. \|travis\|
	108	image:: https://travis-ci.com/ggonnella/gfapy.svg?branch=master
	109	:target: https://travis-ci.com/ggonnella/gfapy
	110	:alt: Travis
	111
	112	.. \|latesttag\|
	113	image:: https://img.shields.io/github/v/tag/ggonnella/gfapy
	114	:target: https://github.com/ggonnella/gfapy/tags
	115	:alt: Latest GitHub tag
	116
	117	.. \|readthedocs\|
	118	image:: https://readthedocs.org/projects/pip/badge/?version=stable
	119	:target: https://pip.pypa.io/en/stable/?badge=stable
	120	:alt: ReadTheDocs
	121
	122	.. \|bioconda\|
	123	image:: https://img.shields.io/conda/vn/bioconda/gfapy
	124	:target: https://bioconda.github.io/recipes/gfapy/README.html
	125	:alt: Bioconda
	126
	127	.. \|pypi\|
	128	image:: https://img.shields.io/pypi/v/gfapy
	129	:target: https://pypi.org/project/gfapy/
	130	:alt: PyPI
	131
	132	.. \|debian\|
	133	image:: https://img.shields.io/debian/v/gfapy
	134	:target: https://packages.debian.org/search?keywords=gfapy
	135	:alt: Debian
	136
	137	.. \|ubuntu\|
	138	image:: https://img.shields.io/ubuntu/v/gfapy
	139	:target: https://packages.ubuntu.com/search?keywords=gfapy
	140	:alt: Ubuntu
	141
	142	.. \|license\|
	143	image:: https://img.shields.io/pypi/l/gfapy
	144	:target: https://github.com/ggonnella/gfapy/blob/master/LICENSE.txt
	145	:alt: ISC License
	146
	147	.. \|requiresio\|
	148	image:: https://requires.io/github/ggonnella/gfapy/requirements.svg?branch=master
	149	:target: https://requires.io/github/ggonnella/gfapy/requirements/?branch=master
	150	:alt: Requirements Status

+0

-3

~~benchmarks/.gitignore~~ less more

0		benchmark_results*
1		jobs_out
2		figure*

+0

-65

~~benchmarks/gfapy-benchmark-collectdata~~ less more

0		#!/bin/bash
1
2		#
3		# This script is derived from rdj-spacepeak.sh in
4		# the GenomeTools repository (www.genometools.org).
5		#
6		# (c) 2010-2017 Giorgio Gonnella, ZBH, University of Hamburg
7		#
8
9		sleeptime=0.1
10
11		if [ $# -eq 0 ]; then
12		echo "Usage: $0 <command> [args]"
13		echo
14		echo "The following information is polled each $sleeptime seconds"
15		echo "from /proc/[pid]/status:"
16		echo
17		echo " VmPeak: Peak virtual memory size."
18		echo " VmSize: Virtual memory size."
19		echo " VmLck: Locked memory size."
20		echo " VmHWM: Peak resident set size (\"high water mark\")."
21		echo " VmRSS: Resident set size."
22		echo " VmData, VmStk, VmExe: Size of data, stack, and text segments."
23		echo " VmLib: Shared library code size."
24		echo " VmPTE: Page table entries size (since Linux 2.6.10)."
25		echo
26		echo "The command is run under /usr/bin/time."
27		exit
28		fi
29
30		# code inspired by:
31		# http://stackoverflow.com/questions/1080461/
32		# /peak-memory-measurement-of-long-running-process-in-linux
33		function __measure_space_peak {
34		types="Peak Size Lck HWM RSS Data Stk Exe Lib PTE"
35		declare -A maxVm
36		for vm in $types; do maxVm[$vm]=0; done
37		ppid=$$
38		/usr/bin/time $@ &
39		tpid=`pgrep -P ${ppid} -n -f time`
40		if [[ ${tpid} -ne "" ]]; then
41		pid=`pgrep -P ${tpid} -n -f $1` # $! may work here but not later
42		fi
43		declare -A Vm
44		while [[ ${tpid} -ne "" ]]; do
45		for vm in $types; do
46		if [[ ${pid} -ne "" ]]; then
47		Vm[$vm]=`cat /proc/${pid}/status 2> /dev/null \
48		\| grep Vm${vm} \| awk '{print $2}'`
49		if [[ ${Vm[$vm]} -gt ${maxVm[$vm]} ]]; then
50		maxVm[$vm]=${Vm[$vm]}
51		fi
52		fi
53		done
54		sleep $sleeptime
55		savedtpid=${tpid}
56		tpid=`pgrep -P ${ppid} -n -f time`
57		done
58		wait ${savedtpid} # don't wait, job is finished
59		exitstatus=$? # catch the exit status of wait, the same of $@
60		echo "Memory usage for $@:" >> /dev/stderr
61		for vm in $types; do echo " Vm$vm: ${maxVm[$vm]} kB" >> /dev/stderr; done
62		echo "Exit status: ${exitstatus}" >> /dev/stderr
63		}
64		__measure_space_peak $*

+0

-120

~~benchmarks/gfapy-plot-benchmarkdata.R~~ less more

0		#!/usr/bin/env Rscript
1		# (c) Giorgio Gonnella, ZBH, Uni Hamburg, 2017
2
3		script.name = "./gfapy-plot-benchmarkdata.R"
4		args <- commandArgs(trailingOnly=TRUE)
5		if (is.na(args[3])) {
6		cat("Usage: ",script.name, " <inputfile> <outpfx> <variable>", "\n")
7		cat("variable: either 'segments' or 'connectivity'\n")
8		stop("Too few command-line parameters")
9		}
10		infname <- args[1]
11		cat("input data: ",infname,"\n")
12		outpfx <- args[2]
13		cat("output prefix:", outpfx, "\n")
14		xvar <- args[3]
15		if (xvar != 'segments' && xvar != 'connectivity') {
16		stop("variable must be one of: segments, connectivity")
17		}
18
19		library("ggplot2")
20
21		#
22		# The following function is described here:
23		# http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper%20functions
24		# Licence: CC0 (https://creativecommons.org/publicdomain/zero/1.0/)
25		#
26		## Gives count, mean, standard deviation, standard error of the mean, and
27		## confidence interval (default 95%).
28		## data: a data frame.
29		## measurevar: the name of a column that contains the var to be summariezed
30		## groupvars: a vector containing names of columns that contain grouping vars
31		## na.rm: a boolean that indicates whether to ignore NA's
32		## conf.interval: the percent range of the confidence interval (default 95%)
33		summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
34		conf.interval=.95, .drop=TRUE) {
35		library(plyr)
36
37		# New version of length which can handle NA's: if na.rm==T, don't count them
38		length2 <- function (x, na.rm=FALSE) {
39		if (na.rm) sum(!is.na(x))
40		else length(x)
41		}
42
43		# This does the summary. For each group's data frame, return a vector with
44		# N, mean, and sd
45		datac <- ddply(data, groupvars, .drop=.drop,
46		.fun = function(xx, col) {
47		c(N = length2(xx[[col]], na.rm=na.rm),
48		mean = mean (xx[[col]], na.rm=na.rm),
49		sd = sd (xx[[col]], na.rm=na.rm)
50		)
51		},
52		measurevar
53		)
54
55		# Rename the "mean" column
56		datac <- rename(datac, c("mean" = measurevar))
57
58		datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
59
60		# Confidence interval multiplier for standard error
61		# Calculate t-statistic for confidence interval:
62		# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
63		ciMult <- qt(conf.interval/2 + .5, datac$N-1)
64		datac$ci <- datac$se * ciMult
65
66		return(datac)
67		}
68
69		data <- read.table(infname, header=T, sep="\t")
70
71		if (xvar == "segments") {
72		xvarname = "lines"
73		xlab="Lines (segments 1/3; dovetails 2/3)"
74		} else {
75		xvarname = "mult"
76		xlab="Dovetails/segment (segments=4000)"
77		data[c("lines")] = (data[c("mult")]+1)*4000
78		}
79
80		time.data <- summarySE(data, measurevar="time", groupvars=c(xvarname))
81		outfname = paste0(outpfx,"_time.log")
82		sink(outfname)
83		print(time.data)
84		time.lm <- lm(time ~ lines, data=data)
85		summary(time.lm)
86		time.nls <- nls(time ~ b + a * lines,
87		data=data, start=list(a=0,b=0),
88		algorithm="port", lower=c(0,0))
89		print(time.nls)
90		sink()
91
92		outfname = paste0(outpfx,"_space.log")
93		sink(outfname)
94		space.data <- summarySE(data, measurevar="space", groupvars=c(xvarname))
95		print(space.data)
96		space.lm <- lm(space ~ lines, data=data)
97		summary(space.lm)
98		space.nls <- nls(space ~ b + a * lines,
99		data=data, start=list(a=0,b=0),
100		algorithm="port", lower=c(0,0))
101		print(space.nls)
102		sink()
103
104		outfname = paste0(outpfx,"_time.pdf")
105		pdf(outfname)
106		print(ggplot(time.data, aes_string(x=xvarname, y="time")) +
107		geom_errorbar(aes(ymin=time-se, ymax=time+se), width=2) +
108		geom_line(size=0.2) + geom_point(size=3) +
109		ylab("Total elapsed time (s)") +
110		xlab(xlab))
111		outfname = paste0(outpfx,"_space.pdf")
112		pdf(outfname)
113		print(ggplot(space.data, aes_string(x=xvarname, y="space")) +
114		geom_errorbar(aes(ymin=space-se, ymax=space+se), width=2) +
115		geom_line(size=0.2) + geom_point(size=3) +
116		ylab("Memory peak (MB)") +
117		xlab(xlab))
118		dev.off()
119

+0

-62

~~benchmarks/gfapy-plot-preparedata.py~~ less more

0		#!/usr/bin/env python3
1		"""
2		Prepare the output of the convert benchmark script for the R plotting script.
3		"""
4
5		import argparse
6		import os
7		import sys
8		import re
9
10		op = argparse.ArgumentParser(description=__doc__)
11		op.add_argument('--version', action='version', version='%(prog)s 1.0')
12		op.add_argument("--mult", "-m", action="store_true",
13		help="set if variable n of edges/segment")
14		op.add_argument("inputfile")
15		opts = op.parse_args()
16
17		if not os.path.exists(opts.inputfile):
18		sys.stderr.write("Input file not found: {}\n".format(opts.inputfile))
19		exit(1)
20
21		with open(opts.inputfile) as inputfile:
22		header = True
23		if opts.mult:
24		outdata = ["mult", "time", "space", "time_per_line", "space_per_line"]
25		else:
26		outdata = ["lines", "time", "space", "time_per_line", "space_per_line"]
27		print("\t".join(outdata))
28		for line in inputfile:
29		if line[:3] == "###":
30		header = False
31		elif not header:
32		data = line.rstrip("\n\r").split("\t")
33		n_segments = data[2]
34		multiplier = data[3]
35		n_lines = int(int(n_segments) * (1+float(multiplier)))
36		elapsed = data[5]
37		elapsed_match = re.compile(r'\s+(\d+):(\d+\.\d+)').match(elapsed)
38		if elapsed_match:
39		minutes = int(elapsed_match.groups()[0])
40		seconds = float(elapsed_match.groups()[1])
41		seconds += minutes * 60
42		else:
43		elapsed_match = re.compile(r'\s+(\d+):(\d+):(\d+)').match(elapsed)
44		if elapsed_match:
45		hours = int(elapsed_match.groups()[0])
46		minutes = int(elapsed_match.groups()[1])
47		seconds = int(elapsed_match.groups()[2])
48		minutes += hours * 60
49		seconds += minutes * 60
50		else:
51		continue
52		memory = data[6]
53		memory = int(re.compile(r'(\d+) kB').match(memory).groups()[0])
54		megabytes = memory / 1024
55		if opts.mult:
56		outdata = [str(multiplier)]
57		else:
58		outdata = [str(n_lines)]
59		outdata += [str(seconds),str(megabytes),
60		str(seconds/n_lines), str(megabytes/n_lines)]
61		print("\t".join(outdata))

+0

-61

~~benchmarks/gfapy-profiler.sh~~ less more

0		#!/bin/bash
1		#$ -clear
2		#$ -q 16c.q
3		#$ -cwd
4		#$ -V
5		#$ -S /bin/bash
6		#$ -o jobs_out
7		#$ -j y
8
9		if [ $# -ne 4 ]; then
10		echo "Usage: $0 <operation> <version> <variable> <range>" > /dev/stderr
11		echo " operation: (mergelinear/convert) ../bin/gfapy-<operation> <gfafile> will be called" > /dev/stderr
12		echo " version: (gfa1/gfa2) gfa version" > /dev/stderr
13		echo " variable: (segments/connectivity)" > /dev/stderr
14		echo " range: (all/fast/slow)" > /dev/stderr
15		exit 1
16		fi
17
18		operation=$1
19		version=$2
20		variable=$3
21		range=$4
22
23		if [ $variable == "segments" ]; then
24		if [ $range == "fast" ]; then
25		nsegments="1000 2000 4000 8000 16000 32000 64000 128000"
26		elif [ $range == "slow" ]; then
27		nsegments="256000 512000 1024000 2048000 4096000"
28		elif [ $range == "all"]; then
29		nsegments="1000 2000 4000 8000 16000 32000 64000 128000 256000 512000 1024000 2048000 4096000"
30		fi
31		else
32		nsegments=4000
33		fi
34
35		if [ $variable == "connectivity" ]; then
36		if [ $range == "fast" ]; then
37		multipliers="2 4 8 16 32 64"
38		elif [ $range == "slow" ]; then
39		multipliers="128 256"
40		elif [ $range == "all"]; then
41		multipliers="2 4 8 16 32 64 128 256"
42		fi
43		else
44		multipliers=2
45		fi
46
47		replicate=1
48		for i in $nsegments; do
49		for m in $multipliers; do
50		fname="${i}_e${m}x.$replicate.${version}"
51		if [ ! -e $fname ]; then
52		./gfapy-randomgraph --segments $i -g $version \
53		--dovetails-per-segment $m --with-sequence > $fname
54		fi
55		echo "Profiling $operation $fname ..."
56		rm -f $fname.$operation.prof
57		python3 -m cProfile -o $fname.$operation.prof \
58		../bin/gfapy-$operation $fname 1> /dev/null
59		done
60		done

+0

-87

~~benchmarks/gfapy-randomgraph~~ less more

0		#!/usr/bin/env python3
1		"""
2		Creates a random graph for testing
3		"""
4
5		import argparse
6		import sys
7		import random
8
9		op = argparse.ArgumentParser(description=__doc__)
10		op.add_argument("--segments", "-s", type=int,
11		help="number of segments", required=True)
12		op.add_argument("--slen", "-l", type=int, default=100,
13		help="lenght of segments sequence")
14		op.add_argument("--with-sequence", "-w", action="store_true")
15		op.add_argument("--dovetails-per-segment", "-d",
16		help="average number of dovetail edges per segment",
17		default=2.0, type=float)
18		op.add_argument('--gfa-version', "-g", default="gfa1",
19		help="gfa version", choices=("gfa1", "gfa2"))
20		op.add_argument('--version', action='version', version='%(prog)s 1.0')
21		opts = op.parse_args()
22
23		if opts.segments < 0:
24		sys.stderr.write("Error: the number of segments must be "+
25		">= 0 ({})\n".format(opts.segments))
26		exit(1)
27		if opts.dovetails_per_segment < 0:
28		sys.stderr.write("Error: the average number of dovetails per segment must "+
29		"be >= 0 ({})\n".format(opts.dovetails_per_segment))
30		exit(1)
31		if opts.slen <= 0:
32		sys.stderr.write("Error: the length of segments sequence must be > 0"+
33		" ({})\n".format(opts.slen))
34		exit(1)
35
36		if opts.gfa_version == "gfa1":
37		print("H\tVN:Z:1.0")
38		else:
39		print("H\tVN:Z:2.0")
40
41		def random_sequence(slen):
42		sequence = []
43		for i in range(slen):
44		sequence.append(random.choice('ACGT'))
45		return "".join(sequence)
46
47		for i in range(opts.segments):
48		if opts.with_sequence:
49		sequence = random_sequence(opts.slen)
50		else:
51		sequence = "*"
52		if opts.gfa_version == "gfa1":
53		print("S\ts{}\t{}\tLN:i:{}".format(i, sequence, opts.slen))
54		else:
55		print("S\ts{}\t{}\t{}".format(i, opts.slen, sequence))
56
57		n_dovetails = int(opts.segments * opts.dovetails_per_segment)
58		edges = {}
59		for i in range(n_dovetails):
60		edge = False
61		while not edge:
62		s_from = random.randint(0, opts.segments-1)
63		s_from_or = random.choice('+-')
64		s_to = random.randint(0, opts.segments-1)
65		s_to_or = random.choice('+-')
66		if s_from not in edges:
67		edges[s_from] = {'+': {}, '-': {}}
68		if s_to not in edges[s_from][s_from_or]:
69		edges[s_from][s_from_or][s_to] = {'+': False, '-': False}
70		if not edges[s_from][s_from_or][s_to][s_to_or]:
71		edges[s_from][s_from_or][s_to][s_to_or] = True
72		edge = True
73		ovlen = opts.slen//10
74		if ovlen == 0: ovlen = 1
75		cigar = "{}M".format(ovlen)
76		if opts.gfa_version == "gfa1":
77		print("L\ts{}\t{}\ts{}\t{}\t{}\tID:Z:e{}".format(s_from, s_from_or, s_to,
78		s_to_or, cigar, i))
79		else:
80		s_from_begin = opts.slen - ovlen if s_from_or == "+" else 0
81		s_from_end = "{}$".format(opts.slen) if s_from_or == "+" else ovlen
82		s_to_begin = opts.slen - ovlen if s_to_or == "-" else 0
83		s_to_end = "{}$".format(opts.slen) if s_to_or == "-" else ovlen
84		print("E\te{}\ts{}{}\ts{}{}\t{}\t{}\t{}\t{}\t{}".format(
85		i, s_from, s_from_or, s_to, s_to_or, s_from_begin, s_from_end,
86		s_to_begin, s_to_end, cigar))

+0

-76

~~benchmarks/gfapy-reproduce-manuscript-figure.py~~ less more

0		#!/usr/bin/env python3
1		"""
2		Run the benchmarks necessary to reproduce the figures of Section 3
3		of the Supplementary Information of the manuscript \"Gfapy: a flexible
4		and extensible software library for handling sequence graphs in Python\"
5		and plots the figures using R.
6		"""
7
8		import argparse
9		import os
10
11		op = argparse.ArgumentParser(description=__doc__)
12		op.add_argument("fignum", help="Figure number", type=int,
13		choices=range(5,9))
14		op.add_argument("--queue", default=None,
15		help="Use the specified queue of a Grid Engine cluster system "+
16		"(e.g. 16c.q). If not provided, the benchmarks are run on the "+
17		"local computer.")
18		op.add_argument("--nrepl",type=int, default=3,
19		help="Number of replicates (default: 3)")
20		op.add_argument("--fast",action="store_true",
21		help="Run only the three fastest datapoints of the benchmark")
22		opts = op.parse_args()
23
24		if opts.fignum == 5:
25		testvar="segments"
26		operation="convert"
27		elif opts.fignum == 6:
28		testvar="connectivity"
29		operation="convert"
30		elif opts.fignum == 7:
31		testvar="segments"
32		operation="mergelinear"
33		else: # 8
34		testvar="connectivity"
35		operation="mergelinear"
36
37		if opts.fast:
38		subset="fast"
39		else:
40		subset="all"
41
42		run_benchmarks_args="figure{}.out {} gfa2 {} {} {}".format(
43		opts.fignum, operation, testvar, subset, opts.nrepl)
44
45		if not opts.queue:
46		os.system("./gfapy-run-benchmarks.sh {}".format(run_benchmarks_args))
47		else:
48		qsub_script_pfx=\
49		"""#!/bin/bash
50		#$ -clear
51		#$ -q {}
52		#$ -cwd
53		#$ -V
54		#$ -S /bin/bash
55		#$ -o jobs_out
56		#$ -j y
57		#$ -sync y
58
59		""".format(opts.queue)
60		with open("gfapy-run-benchmarks.sh", "r") as input_file:
61		content = input_file.read()
62		with open("gfapy-run-benchmarks.qsub", "w") as output_file:
63		output_file.write(qsub_script_pfx)
64		output_file.write(content)
65		os.system("mkdir -p jobs_out")
66		os.system("qsub gfapy-run-benchmarks.qsub {}".format(run_benchmarks_args))
67
68		if testvar == "segments":
69		prepareflag=""
70		else:
71		prepareflag="--mult"
72		os.system("./gfapy-plot-preparedata.py {} figure{}.out > figure{}.dat".format(
73		prepareflag, opts.fignum, opts.fignum))
74		os.system("./gfapy-plot-benchmarkdata.R figure{}.dat figure{} {}".format(
75		opts.fignum, opts.fignum, testvar))

+0

-67

~~benchmarks/gfapy-run-benchmarks.sh~~ less more

0		#!/bin/bash
1
2		if [ $# -ne 6 ]; then
3		echo "Usage: $0 <outfile> <operation> <version> <variable> <range> <nrepl>" > /dev/stderr
4		echo " outfile: will be overwritten if exists" > /dev/stderr
5		echo " operation: (mergelinear/convert) ../bin/gfapy-<operation> <gfafile> will be called" > /dev/stderr
6		echo " version: (gfa1/gfa2) gfa version" > /dev/stderr
7		echo " variable: (segments/connectivity)" > /dev/stderr
8		echo " range: (all/fast/slow)" > /dev/stderr
9		echo " nrepl: (e.g. 3) number of replicates" > /dev/stderr
10		exit 1
11		fi
12
13		outfile=$1
14		operation=$2
15		version=$3
16		variable=$4
17		range=$5
18		nrepl=$6
19
20		if [ $variable == "segments" ]; then
21		if [ $range == "fast" ]; then
22		nsegments="1000 2000 4000"
23		elif [ $range == "slow" ]; then
24		nsegments="8000 16000 32000 64000 128000 256000 512000 1024000 2048000"
25		elif [ $range == "all"]; then
26		nsegments="1000 2000 4000 8000 16000 32000 64000 128000 256000 512000 1024000 2048000"
27		fi
28		else
29		nsegments=4000
30		fi
31
32		if [ $variable == "connectivity" ]; then
33		if [ $range == "fast" ]; then
34		multipliers="2 4 8"
35		elif [ $range == "slow" ]; then
36		multipliers="16 32 64 128 256"
37		elif [ $range == "all"]; then
38		multipliers="2 4 8 16 32 64 128 256"
39		fi
40		else
41		multipliers=2
42		fi
43
44		mkdir -p benchmark_results
45		rm -f $outfile
46		echo "# hostname: $HOSTNAME" > $outfile
47		echo "### benchmark data:" >> $outfile
48		for ((replicate=1;replicate<=nrepl;++replicate)); do
49		for i in $nsegments; do
50		for m in $multipliers; do
51		fname="benchmark_results/${i}_e${m}x.$replicate.${version}"
52		bmout="$fname.$operation.benchmark"
53		rm -f $bmout
54		if [ ! -e $fname ]; then
55		./gfapy-randomgraph --segments $i -g $version \
56		--dovetails-per-segment $m --with-sequence > $fname
57		fi
58		./gfapy-benchmark-collectdata ../bin/gfapy-$operation $fname \
59		1> /dev/null 2> $bmout
60		elapsed=$(grep -P -o "(?<=) [^ ]*(?=elapsed)" $bmout)
61		memory=$(grep -P -o "(?<=VmHWM: ).*" $bmout)
62		filesize=( $(ls -ln $fname) );filesize=${filesize[4]}
63		echo -e "gfapy-$operation\t$version\t$i\t$m\t$replicate\t$elapsed\t$memory\t$filesize" >> $outfile
64		done
65		done
66		done

+0

-47

~~bin/gfapy-diff~~ less more

0		#!/usr/bin/env python3
1		"""
2		Compare two GFA files
3
4		Note: the current version is not yet functional and only checking segments.
5		Work in progress.
6		"""
7
8		import sys
9		import os
10		import gfapy
11		import argparse
12
13		op = argparse.ArgumentParser(description=__doc__)
14		op.add_argument('--version', action='version', version='%(prog)s 0.1')
15		op.add_argument("filename1")
16		op.add_argument("filename2")
17		opts = op.parse_args()
18
19		gfa1 = gfapy.Gfa.from_file(opts.filename1)
20		gfa2 = gfapy.Gfa.from_file(opts.filename2)
21
22		different = False
23
24		if gfa1.version != gfa2.version:
25		print("# different version")
26		exit(1)
27		else:
28		for s in gfa1.segments:
29		s2 = gfa2.segment(s)
30		if s2 is None:
31		different = True
32		print("# segment {} in {} but not in {}".format(s.name, opts.filename1, opts.filename2))
33		if s.diff(s2):
34		different = True
35		for diff in s.diff(s2):
36		print(diff)
37		for s in gfa2.segments:
38		s1 = gfa1.segment(s)
39		if s1 is None:
40		different = True
41		print("# segment {} in {} but not in {}".format(s.name, opts.filename2, opts.filename1))
42
43		if different:
44		exit(1)
45		else:
46		exit(0)

+0

-51

~~bin/gfapy-fillseq~~ less more

0		#!/usr/bin/env python3
1		"""
2		Add sequences from a Fasta file to a GFA file.
3		"""
4
5		import argparse
6		import sys
7		import gfapy
8
9		op = argparse.ArgumentParser(description=__doc__)
10		op.add_argument("inputgfa")
11		op.add_argument("inputfasta")
12		op.add_argument("-q", "--quiet", action="store_true", help="silence warnings")
13		op.add_argument("-v", "--verbose", action="store_true", help="verbose output")
14		op.add_argument("-V", '--version', action='version', version='%(prog)s 0.1')
15		opts = op.parse_args()
16
17		# note when applying to the output of older versions of Canu (1.6)
18		# the following fix to the GFA VN tag is necessary:
19		# sed -i s'/VN:Z:bogart\/edges/VN:Z:1.0/' canu.contigs.gfa
20
21		g = gfapy.Gfa.from_file(opts.inputgfa)
22
23		segment = None
24		slines = []
25		with open(opts.inputfasta) as f:
26		for line in f:
27		line = line.strip()
28		if line.startswith(">"):
29		if segment:
30		segment.sequence = "".join(slines)
31		sname = line[1:].split(" ")[0]
32		if opts.verbose:
33		sys.stderr.write("Processing segment {}...\n".format(sname))
34		segment = g.segment(sname)
35		if not opts.quiet and not segment:
36		sys.stderr.write("Warning: Segment with ID {} ".format(sname)+
37		"found in Fasta but not in GFA file\n")
38		slines = []
39		else:
40		slines.append(line)
41		if segment:
42		segment.sequence = "".join(slines)
43
44		if not opts.quiet:
45		for s in g.segments:
46		if s.sequence == gfapy.Placeholder:
47		sys.stderr.write("Warning: Segment with ID {} ".format(s.name)+
48		"found in GFA but not in Fasta file\n")
49
50		print(g)

+6

-0

debian/changelog less more

	0	gfapy (1.2.3+git20220408.1.12b31da+dfsg-1) UNRELEASED; urgency=low
	1
	2	* New upstream snapshot.
	3
	4	-- Debian Janitor <janitor@jelmer.uk> Thu, 03 Nov 2022 08:41:06 -0000
	5
0	6	gfapy (1.2.3+dfsg-1) unstable; urgency=medium
1	7
2	8	* New upstream release.

+0

-2

~~doc/.gitignore~~ less more

0		source
1		_build

+0

-23

~~doc/Makefile~~ less more

0		# Minimal makefile for Sphinx documentation
1		#
2
3		# You can set these variables from the command line.
4		SPHINXOPTS =
5		SPHINXBUILD = sphinx-build
6		SPHINXPROJ = Gfapy
7		SOURCEDIR = .
8		BUILDDIR = _build
9
10		# Put it first so that "make" without argument is like "make help".
11		help:
12		@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
13
14		.PHONY: help Makefile
15
16		cleanup:
17		rm source _build -rf
18
19		# Catch-all target: route all unknown targets to Sphinx using the new
20		# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
21		%: Makefile
22		@PYTHONHASHSEED=0 $(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

+0

-4

~~doc/changelog.rst~~ less more

0		Changelog
1		---------
2		.. include:: ../CHANGES.txt
3		:literal:

+0

-173

~~doc/conf.py~~ less more

0		#!/usr/bin/env python3
1		# -- coding: utf-8 --
2		#
3		# Gfapy documentation build configuration file, created by
4		# sphinx-quickstart on Thu Mar 16 10:13:57 2017.
5		#
6		# This file is execfile()d with the current directory set to its
7		# containing dir.
8		#
9		# Note that not all possible configuration values are present in this
10		# autogenerated file.
11		#
12		# All configuration values have a default; values that are commented out
13		# serve to show the default.
14
15		# If extensions (or modules to document with autodoc) are in another directory,
16		# add these directories to sys.path here. If the directory is relative to the
17		# documentation root, use os.path.abspath to make it absolute, like shown here.
18		#
19		import os
20		import sys
21		sys.path.insert(0, os.path.abspath('.'))
22		sys.path.insert(0, os.path.abspath('../'))
23
24		# -- General configuration ------------------------------------------------
25
26		# If your documentation needs a minimal Sphinx version, state it here.
27		#
28		# needs_sphinx = '1.0'
29
30		# Add any Sphinx extension module names here, as strings. They can be
31		# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
32		# ones.
33		extensions = [
34		'sphinx.ext.autodoc',
35		'sphinx.ext.doctest',
36		'sphinx.ext.todo',
37		'sphinx.ext.coverage',
38		'sphinx.ext.imgmath',
39		'sphinx.ext.ifconfig',
40		'sphinx.ext.viewcode',
41		'sphinx.ext.githubpages',
42		'sphinx.ext.napoleon'
43		]
44
45		# Napoleon
46		napoleon_numpy_docstring = True
47		napoleon_google_docstring = True
48		napoleon_use_param = False
49		napoleon_use_ivar = True
50
51		# Default role:
52		default_role = 'any'
53
54		# Add any paths that contain templates here, relative to this directory.
55		templates_path = ['_templates']
56
57		# The suffix(es) of source filenames.
58		# You can specify multiple suffix as a list of string:
59		#
60		# source_suffix = ['.rst', '.md']
61		source_suffix = '.rst'
62
63		# The master toctree document.
64		master_doc = 'index'
65
66		# General information about the project.
67		project = 'Gfapy'
68		copyright = '2017--2022, Giorgio Gonnella and others (see CONTRIBUTORS)'
69		author = 'Giorgio Gonnella and others (see CONTRIBUTORS)'
70
71		# The version info for the project you're documenting, acts as replacement for
72		# \|version\| and \|release\|, also used in various other places throughout the
73		# built documents.
74		#
75		# The short X.Y version.
76		version = '1.2'
77		# The full version, including alpha/beta/rc tags.
78		release = '1.2.3'
79
80		# The language for content autogenerated by Sphinx. Refer to documentation
81		# for a list of supported languages.
82		#
83		# This is also used if you do content translation via gettext catalogs.
84		# Usually you set "language" from the command line for these cases.
85		language = None
86
87		# List of patterns, relative to source directory, that match files and
88		# directories to ignore when looking for source files.
89		# This patterns also effect to html_static_path and html_extra_path
90		exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
91
92		# The name of the Pygments (syntax highlighting) style to use.
93		pygments_style = 'sphinx'
94
95		# If true, `todo` and `todoList` produce output, else they produce nothing.
96		todo_include_todos = False
97
98		# -- Options for HTML output ----------------------------------------------
99
100		# The theme to use for HTML and HTML Help pages. See the documentation for
101		# a list of builtin themes.
102		#
103		html_theme = 'sphinx_rtd_theme'
104
105		# Theme options are theme-specific and customize the look and feel of a theme
106		# further. For a list of options available for each theme, see the
107		# documentation.
108		#
109		# html_theme_options = {}
110
111		# Add any paths that contain custom static files (such as style sheets) here,
112		# relative to this directory. They are copied after the builtin static files,
113		# so a file named "default.css" will overwrite the builtin "default.css".
114		html_static_path = ['_static']
115
116
117		# -- Options for HTMLHelp output ------------------------------------------
118
119		# Output file base name for HTML help builder.
120		htmlhelp_basename = 'Gfapydoc'
121
122
123		# -- Options for LaTeX output ---------------------------------------------
124
125		latex_elements = {
126		# The paper size ('letterpaper' or 'a4paper').
127		#
128		'papersize': 'a4paper',
129
130		# The font size ('10pt', '11pt' or '12pt').
131		#
132		# 'pointsize': '10pt',
133
134		# Additional stuff for the LaTeX preamble.
135		#
136		# 'preamble': '',
137
138		# Latex figure (float) alignment
139		#
140		# 'figure_align': 'htbp',
141		}
142
143		# Grouping the document tree into LaTeX files. List of tuples
144		# (source start file, target name, title,
145		# author, documentclass [howto, manual, or own class]).
146		latex_documents = [
147		(master_doc, 'Gfapy.tex', 'Gfapy Documentation',
148		'Giorgio Gonnella', 'manual'),
149		]
150
151		# -- Options for manual page output ---------------------------------------
152
153		# One entry per manual page. List of tuples
154		# (source start file, name, description, authors, manual section).
155		man_pages = [
156		(master_doc, 'gfapy', 'Gfapy Documentation',
157		[author], 1)
158		]
159
160
161		# -- Options for Texinfo output -------------------------------------------
162
163		# Grouping the document tree into Texinfo files. List of tuples
164		# (source start file, target name, title, author,
165		# dir menu entry, description, category)
166		texinfo_documents = [
167		(master_doc, 'Gfapy', 'Gfapy Documentation',
168		author, 'Gfapy',
169		'Python library for the Graphic Fragment Assembly (GFA) format.',
170		'Miscellaneous'),
171		]
172

+0

-36

~~doc/index.rst~~ less more

0		.. Gfapy documentation master file, created by
1		sphinx-quickstart on Thu Mar 16 10:13:57 2017.
2		You can adapt this file completely to your liking, but it should at least
3		contain the root `toctree` directive.
4
5		Gfapy documentation
6		===================
7
8		.. toctree::
9		:maxdepth: 2
10		:caption: Contents:
11
12		readme
13		changelog
14
15		tutorial/gfa
16		tutorial/validation
17		tutorial/positional_fields
18		tutorial/placeholders
19		tutorial/positions
20		tutorial/alignments
21		tutorial/tags
22		tutorial/references
23		tutorial/header
24		tutorial/custom_records
25		tutorial/comments
26		tutorial/errors
27		tutorial/graph_operations
28		tutorial/rgfa
29
30		Indices and tables
31		==================
32
33		* :ref:`genindex`
34		* :ref:`modindex`
35		* :ref:`search`

+0

-5

~~doc/readme.rst~~ less more

0		Introduction
1		============
2		.. include:: ../README.rst
3		:start-after: sphinx-begin
4		:end-before: sphinx-end

+0

-1

~~doc/run_apidoc.sh~~ less more

0

sphinx-apidoc -o source/ ../gfapy

+0

-238

~~doc/tutorial/alignments.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		from gfapy import is_placeholder, Alignment
4		h = "H\tVN:Z:2.0\tTS:i:100"
5		sA = "S\tA\t100\t*"
6		sB = "S\tB\t100\t*"
7		x = "E\tx\tA+\tB-\t0\t100$\t0\t100$\t4,2\tTS:i:50"
8		gfa = gfapy.Gfa([h, sA, sB, x])
9
10		.. _alignments:
11
12		Alignments
13		~~~~~~~~~~
14
15		Some GFA1 (L/C overlap, P overlaps) and GFA2 (E/F alignment) fields contain
16		alignments or lists of alignments. The alignment can be left unspecified and a
17		placeholder symbol ``*`` used instead. In GFA1 the alignments can be given as
18		CIGAR strings, in GFA2 also as Dazzler traces.
19
20		Gfapy uses three different classes for representing the content of alignment fields:
21		:class:`~gfapy.alignment.cigar.CIGAR`, :class:`~gfapy.alignment.trace.Trace`
22		and :class:`~gfapy.alignment.placeholder.AlignmentPlaceholder`.
23
24		Creating an alignment
25		^^^^^^^^^^^^^^^^^^^^^
26
27		An alignment instance is usually created from its GFA string
28		representation or from a list by using the
29		:class:`gfapy.Alignment() <gfapy.alignment.alignment.Alignment>`
30		constructor.
31
32		.. doctest::
33
34		>>> from gfapy import Alignment
35		>>> Alignment("*")
36		gfapy.AlignmentPlaceholder()
37		>>> Alignment("10,10,10")
38		gfapy.Trace([10,10,10])
39		>>> Alignment([10,10,10])
40		gfapy.Trace([10,10,10])
41		>>> Alignment("30M2I")
42		gfapy.CIGAR([gfapy.CIGAR.Operation(30,'M'), gfapy.CIGAR.Operation(2,'I')])
43
44		If the argument is an alignment object it will be returned,
45		so that is always safe to call the method on a variable which can
46		contain a string or an alignment instance:
47
48		.. doctest::
49
50		>>> Alignment(Alignment("*"))
51		gfapy.AlignmentPlaceholder()
52		>>> Alignment(Alignment("10,10"))
53		gfapy.Trace([10,10])
54
55		Recognizing undefined alignments
56		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
57
58		The :func:`gfapy.is_placeholder() <gfapy.placeholder.is_placeholder>` method
59		allows to test if an alignment field contains an undefined value (placeholder)
60		instead of a defined value (CIGAR string, trace). The method accepts as
61		argument either an alignment object or a string or list representation.
62
63		.. doctest::
64
65		>>> from gfapy import is_placeholder, Alignment
66		>>> is_placeholder(Alignment("30M"))
67		False
68		>>> is_placeholder(Alignment("10,10"))
69		False
70		>>> is_placeholder(Alignment("*"))
71		True
72		>>> is_placeholder("*")
73		True
74		>>> is_placeholder("30M")
75		False
76		>>> is_placeholder("10,10")
77		False
78		>>> is_placeholder([])
79		True
80		>>> is_placeholder([10,10])
81		False
82
83		Note that, as a placeholder is ``False`` in boolean context, just a
84		``if not aligment`` will also work, if alignment is an alignment object.
85		But this of course, does not work, if it is a string representation.
86		Therefore it is better to use the
87		:func:`gfapy.is_placeholder() <gfapy.placeholder.is_placeholder>` method,
88		which works in both cases.
89
90		.. doctest::
91
92		>>> if not Alignment("*"): print('no alignment')
93		no alignment
94		>>> if is_placeholder(Alignment("*")): print('no alignment')
95		no alignment
96		>>> if "*": print('not a placeholder...?')
97		not a placeholder...?
98		>>> if is_placeholder("*"): print('really? it is a placeholder!')
99		really? it is a placeholder!
100
101		Reading and editing CIGARs
102		^^^^^^^^^^^^^^^^^^^^^^^^^^
103
104		CIGARs are represented by specialized lists, instances of the class
105		:class:`~gfapy.alignment.cigar.CIGAR`, whose elements are CIGAR operations
106		CIGAR operations are represented by instance of the class
107		:class:`~gfapy.alignment.cigar.CIGAR.Operation`,
108		and provide the properties ``length`` (length of the operation, an integer)
109		and ``code`` (one-letter string which specifies the type of operation).
110		Note that not all operations allowed in SAM files (for which CIGAR strings
111		were first defined) are also meaningful in GFA and thus GFA2 only allows
112		the operations ``M``, ``I``, ``D`` and ``P``.
113
114		.. doctest::
115
116		>>> cigar = gfapy.Alignment("30M")
117		>>> isinstance(cigar, list)
118		True
119		>>> operation = cigar[0]
120		>>> type(operation)
121		<class 'gfapy.alignment.cigar.CIGAR.Operation'>
122		>>> operation.code
123		'M'
124		>>> operation.code = 'D'
125		>>> operation.length
126		30
127		>>> len(operation)
128		30
129		>>> str(operation)
130		'30D'
131
132		As a CIGAR instance is a list, list methods apply to it. If the array is
133		emptied, its string representation will be the placeholder symbol ``*``.
134
135		.. doctest::
136
137		>>> cigar = gfapy.Alignment("1I20M2D")
138		>>> cigar[0].code = "M"
139		>>> cigar.pop(1)
140		gfapy.CIGAR.Operation(20,'M')
141		>>> str(cigar)
142		'1M2D'
143		>>> cigar[:] = []
144		>>> str(cigar)
145		'*'
146
147		The validate :func:`CIGAR.validate() <gfapy.alignment.cigar.CIGAR.validate>`
148		function checks if a CIGAR instance is valid. A version can be provided, as the
149		CIGAR validation is version specific (as GFA2 forbids some CIGAR operations).
150
151		.. doctest::
152
153		>>> cigar = gfapy.Alignment("30M10D20M5I10M")
154		>>> cigar.validate()
155		>>> cigar[1].code = "L"
156		>>> cigar.validate()
157		Traceback (most recent call last):
158		...
159		gfapy.error.ValueError:
160		>>> cigar = gfapy.Alignment("30M10D20M5I10M")
161		>>> cigar[1].code = "X"
162		>>> cigar.validate(version="gfa1")
163		>>> cigar.validate(version="gfa2")
164		Traceback (most recent call last):
165		...
166		gfapy.error.ValueError:
167
168		Reading and editing traces
169		^^^^^^^^^^^^^^^^^^^^^^^^^^
170
171		Traces are arrays of non-negative integers. The values are interpreted
172		using a trace spacing value. If traces are used, a trace spacing value
173		must be defined in a TS integer tag, either in the header, or in the
174		single lines which contain traces (which takes precedence over the
175		header global value).
176
177		.. doctest::
178
179		>>> print(gfa) #doctest: +SKIP
180		H TS:i:100
181		E x A+ B- 0 100$ 0 100$ 4,2 TS:i:50
182		...
183		>>> gfa.header.TS
184		100
185		>>> gfa.line("x").TS
186		50
187
188		Query, reference and complement
189		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
190
191		CIGARs are asymmetric, i.e.\ they consider one sequence as reference and
192		another sequence as query.
193
194		The :func:`~gfapy.alignment.cigar.CIGAR.length_on_reference` and
195		:func:`~gfapy.alignment.cigar.CIGAR.length_on_query` methods compute the length
196		of the alignment on the two sequences. These methods are used by the library
197		e.g. to convert GFA1 L lines to GFA2 E lines (which is only possible if CIGARs
198		are provided).
199
200		.. doctest::
201
202		>>> cigar = gfapy.Alignment("30M10D20M5I10M")
203		>>> cigar.length_on_reference()
204		70
205		>>> cigar.length_on_query()
206		65
207
208		CIGARs are dependent on which sequence is taken as reference and which
209		is taken as query. For each alignment, a complement CIGAR can be
210		computed using the method
211		:func:`~gfapy.alignment.cigar.CIGAR.complement`; it is the CIGAR obtained
212		when the two sequences are switched.
213
214		.. doctest::
215
216		>>> cigar = gfapy.Alignment("2M1D3M")
217		>>> str(cigar.complement())
218		'3M1I2M'
219
220		The current version of Gfapy does not provide a way to compute the
221		alignment, thus the trace information can be accessed and edited, but
222		not used for this purpose. Because of this there is currently no way in
223		Gfapy to compute a complement trace (trace obtained when the sequences
224		are switched).
225
226		.. doctest::
227
228		>>> trace = gfapy.Alignment("1,2,3")
229		>>> str(trace.complement())
230		'*'
231
232		The complement of a placeholder is a placeholder:
233
234		.. doctest::
235
236		>>> str(gfapy.Alignment("*").complement())
237		'*'

+0

-71

~~doc/tutorial/comments.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		g = gfapy.Gfa()
4
5		.. _comments:
6
7		Comments
8		--------
9
10		GFA lines starting with a ``#`` symbol are considered comments. In Gfapy
11		comments are represented by instances of the class :class:`gfapy.line.Comment
12		<gfapy.line.comment.comment.Comment>`. They have a similar interface to other
13		line instances, with some differences, e.g. they do not support tags.
14
15		The comments collection
16		~~~~~~~~~~~~~~~~~~~~~~~
17
18		The comments of a Gfa object are accessed using the :func:`Gfa.comments
19		<gfapy.lines.collections.Collections.comments>` property. This is a list of
20		comment line instances. The single elements can be modified, but the list
21		itself is read-only. To remove a comment from the Gfa, you need to find the
22		instance in the list, and call
23		:func:`~gfapy.line.common.disconnection.Disconnection.disconnect` on it. To
24		add a comment to a :class:`~gfapy.gfa.Gfa` instance is done similarly to other
25		lines, by using the :func:`Gfa.add_line(line)
26		<gfapy.lines.creators.Creators.add_line>` method.
27
28		.. doctest::
29
30		>>> g.add_line("# this is a comment") #doctest: +ELLIPSIS
31		>>> [str(c) for c in g.comments]
32		['# this is a comment']
33		>>> g.comments[0].disconnect()
34		>>> g.comments
35		[]
36
37		Accessing the comment content
38		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
39
40		The content of the comment line, excluding the initial ``#`` and eventual
41		initial spacing characters, is included in the ``content`` field. The initial
42		spacing characters can be read/changed using the ``spacer`` field. The default
43		value is a single space.
44
45		.. doctest::
46
47		>>> g.add_line("# this is a comment") #doctest: +ELLIPSIS
48		>>> c = g.comments[-1]
49		>>> c.content
50		'this is a comment'
51		>>> c.spacer
52		' '
53		>>> c.spacer = '___'
54		>>> str(c)
55		'#___this is a comment'
56
57		Tags are not supported by comment lines. If the line contains tags,
58		these are nor parsed, but included in the ``content`` field. Trying to set
59		tags raises exceptions.
60
61		.. doctest::
62
63		>>> c = gfapy.Line("# this is not a tag\txx:i:1")
64		>>> c.content
65		'this is not a tag\txx:i:1'
66		>>> c.xx
67		>>> c.xx = 1
68		Traceback (most recent call last):
69		...
70		gfapy.error.RuntimeError: Tags of comment lines cannot be set

+0

-296

~~doc/tutorial/custom_records.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		g = gfapy.Gfa(version = 'gfa2')
4
5		.. _custom_records:
6
7		Custom records
8		--------------
9
10		The GFA2 specification considers each line which starts with a non-standard
11		record type a custom (i.e. user- or program-specific) record.
12		Gfapy allows to retrieve these records and access their data using a
13		similar interface to that for the predefined record types.
14
15		Retrieving, adding and deleting custom records
16		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17
18		Gfa instances have the property
19		:func:`~gfapy.lines.collections.Collections.custom_records`,
20		a list of all line instances with a non-standard record type. Among these,
21		records of a specific record type are retrieved using the method
22		:func:`Gfa.custom_records_of_type(record_type)
23		<gfapy.lines.collections.Collections.custom_records_of_type>`.
24		Lines are added and deleted using the same methods
25		(:func:`~gfapy.lines.creators.Creators.add_line` and
26		:func:`~gfapy.line.common.disconnection.Disconnection.disconnect`) as for
27		other line types.
28
29		.. doctest::
30
31		>>> g.add_line("X\tcustom line") #doctest: +ELLIPSIS
32		>>> g.add_line("Y\tcustom line") #doctest: +ELLIPSIS
33		>>> [str(line) for line in g.custom_records] #doctest: +SKIP
34		['X\tcustom line', 'Y\tcustom line']
35		>>> g.custom_record_keys) #doctest: +SKIP
36		['X', 'Y']
37		>>> [str(line) for line in g.custom_records_of_type('X')]
38		['X\tcustom line']
39		>>> g.custom_records_of_type("X")[-1].disconnect()
40		>>> g.custom_records_of_type('X')
41		[]
42
43		Interface without extensions
44		~~~~~~~~~~~~~~~~~~~~~~~~~~~~
45
46		If no extension (see :ref:`extensions` section) has been defined to handle a
47		custom record type, the interface has some limitations: the field content is
48		not validated, and the field names are unknown. The generic custom record
49		class is employed
50		(:class:`~gfapy.line.custom_record.custom_record.CustomRecord`).
51
52		As the name of the positional fields in a custom record is not known, a generic
53		name ``field1``, ``field2``, ... is used. The number of positional fields is
54		found by getting the length of the
55		:attr:`~gfapy.line.custom_record.init.Init.positional_fieldnames` list.
56
57		.. doctest::
58
59		>>> g.add_line("X\ta\tb\tcc:i:10\tdd:i:100") #doctest: +ELLIPSIS
60		>>> x = g.custom_records_of_type('X')[-1]
61		>>> len(x.positional_fieldnames)
62		2
63		>>> x.field1
64		'a'
65		>>> x.field2
66		'b'
67
68		Positional fields are allowed to contain any character (including non-printable
69		characters and spacing characters), except tabs and newlines (as they are
70		structural elements of the line). No further validation is performed.
71
72		As Gfapy cannot know how many positional fields are present when parsing custom
73		records, a heuristic approach is followed, to identify tags. A field resembles
74		a tag if it starts with ``tn:d:`` where ``tn`` is a valid tag name and ``d`` a
75		valid tag datatype (see :ref:`tags` chapter). The fields are parsed from the
76		last to the first.
77
78		As soon as a field is found which does not resemble a tag, all remaining fields
79		are considered positionals (even if another field parsed later resembles a
80		tag). Due to this, invalid tags are sometimes wrongly taken as positional
81		fields (this can be avoided by writing an extension).
82
83		.. doctest::
84
85		>>> g.add_line("X\ta\tb\tcc:i:10\tdd:i:100") #doctest: +ELLIPSIS
86		>>> x1 = g.custom_records_of_type("X")[-1]
87		>>> x1.cc
88		10
89		>>> x1.dd
90		100
91		>>> g.add_line("X\ta\tb\tcc:i:10\tdd:i:100\te") #doctest: +ELLIPSIS
92		>>> x2 = g.custom_records_of_type("X")[-1]
93		>>> x2.cc
94		>>> x2.field3
95		'cc:i:10'
96		>>> g.add_line("Z\ta\tb\tcc:i:10\tddd:i:100") #doctest: +ELLIPSIS
97		>>> x3 = g.custom_records_of_type("Z")[-1]
98		>>> x3.cc
99		>>> x3.field3
100		'cc:i:10'
101		>>> x3.field4
102		'ddd:i:100'
103
104		.. _extensions:
105
106		Extensions
107		~~~~~~~~~~
108
109		The support for custom fields is limited, as Gfapy does not know which and how
110		many fields are there and how shall they be validated. It is possible to create
111		an extension of Gfapy, which defines new record types: this will allow to use
112		these record types in a similar way to the built-in types.
113
114		As an example, an extension will be described, which defines two record types:
115		T for taxa and M for assignments of segments to taxa. For further information
116		about the possible usage case for this extension, see the Supplemental
117		Information to the manuscript describing Gfapy.
118
119		The T records will contain a single positional field, ``tid``, a GFA2
120		identifier, and an optional UL string tag. The M records will contain three
121		positional fields (all three GFA2 identifier): a name field ``mid`` (optional),
122		and two references, ``tid`` to a T line and ``sid`` to an S line. The SC
123		integer tag will be also defined. Here is an example of a GFA containing M and
124		T lines:
125
126		.. code::
127
128		S sA 1000 *
129		S sB 1000 *
130		M assignment1 t123 sA SC:i:40
131		M assignment2 t123 sB
132		M * B12c sB SC:i:20
133		T B12c
134		T t123 UL:Z:http://www.taxon123.com
135
136		Writing subclasses of the :class:`~gfapy.line.line.Line` class, it is possible to
137		communicate to Gfapy, how records of the M and T class shall be handled. This
138		only requires to define some constants and to call the class method
139		:func:`~gfapy.line.line.Line.register_extension`.
140
141		The constants to define are ``RECORD TYPE``, which shall be the content
142		of the record type field (e.g. ``M``); ``POSFIELDS`` shall contain an ordered
143		dict, specifying the datatype for each positional field, in the order these
144		fields are found in the line; ``TAGS_DATATYPE`` is a dict, specifying the
145		datatype of the predefined optional tags; ``NAME_FIELD`` is a field name,
146		and specifies which field contains the identifier of the line.
147		For details on predefined and custom datatypes, see the next sections
148		(:ref:`predefined_datatypes` and :ref:`custom_datatypes`).
149
150		To handle references, :func:`~gfapy.line.line.Line.register_extension`
151		can be supplied with a ``references`` parameter, a list of triples
152		``(fieldname, classname, backreferences)``. Thereby ``fieldname`` is the name
153		of the field in the corresponding record containing the reference (e.g.
154		``sid``), ``classname`` is the name of the class to which the reference goes
155		(e.g. ``gfa.line.segment.GFA2``), and \texttt{backreferences} is how the
156		collection of backreferences shall be called, in the records to which reference
157		points to (e.g. ``metagenomic_assignments``).
158
159		.. code:: python
160
161		from collections include OrderedDict
162
163		class Taxon(gfapy.Line):
164		RECORD_TYPE = "T"
165		POSFIELDS = OrderedDict([("tid","identifier_gfa2")])
166		TAGS_DATATYPE = {"UL":"Z"}
167		NAME_FIELD = "tid"
168
169		Taxon.register_extension()
170
171		class MetagenomicAssignment(gfapy.Line):
172		RECORD_TYPE = "M"
173		POSFIELDS = OrderedDict([("mid","optional_identifier_gfa2"),
174		("tid","identifier_gfa2"),
175		("sid","identifier_gfa2")])
176		TAGS_DATATYPE = {"SC":"i"}
177		NAME_FIELD = "mid"
178
179		MetagenomicAssignment.register_extension(references=
180		[("sid", gfapy.line.segment.GFA2, "metagenomic_assignments"),
181		("tid", Taxon, "metagenomic_assignments")])
182
183		.. _predefined_datatypes:
184
185		Predefined datatypes for extensions
186		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
187
188		The datatype of fields is specified in Gfapy using classes, which provide
189		functions for decoding, encoding and validating the corresponding data.
190		Gfapy contains a number of datatypes which correspond to the description
191		of the field content in the GFA1 and GFA2 specification.
192
193		When writing extensions only the GFA2 field datatypes are generally used
194		(as GFA1 does not contain custom fields). They are summarized in
195		the following table:
196
197		+-------------------------------------+---------------+--------------------------------------------------------+
198		\| Name \| Example \| Description \|
199		+=====================================+===============+========================================================+
200		\| ``alignment_gfa2`` \| ``12M1I3M`` \| CIGAR string, Trace alignment or Placeholder (``*``) \|
201		+-------------------------------------+---------------+--------------------------------------------------------+
202		\| ``identifier_gfa2`` \| ``S1`` \| ID of a line \|
203		+-------------------------------------+---------------+--------------------------------------------------------+
204		\| ``oriented_identifier_gfa2`` \| ``S1+`` \| ID of a line followed by ``+`` or ``-`` \|
205		+-------------------------------------+---------------+--------------------------------------------------------+
206		\| ``optional_identifier_gfa2`` \| ```` \| ID of a line or Placeholder (````) \|
207		+-------------------------------------+---------------+--------------------------------------------------------+
208		\| ``identifier_list_gfa2`` \| ``S1 S2`` \| space separated list of line IDs \|
209		+-------------------------------------+---------------+--------------------------------------------------------+
210		\| ``oriented_identifier_list_gfa2`` \| ``S1+ S2-`` \| space separated list of line IDs plus orientations \|
211		+-------------------------------------+---------------+--------------------------------------------------------+
212		\| ``position_gfa2`` \| ``120$`` \| non-negative integer, optionally followed by ``$`` \|
213		+-------------------------------------+---------------+--------------------------------------------------------+
214		\| ``sequence_gfa2`` \| ``ACGNNYR`` \| sequence of printable chars., no whitespace \|
215		+-------------------------------------+---------------+--------------------------------------------------------+
216		\| ``string`` \| ``a b_c;d`` \| string, no tabs and newlines (Z tags) \|
217		+-------------------------------------+---------------+--------------------------------------------------------+
218		\| ``char`` \| ``A`` \| single character (A tags) \|
219		+-------------------------------------+---------------+--------------------------------------------------------+
220		\| ``float`` \| ``1.12`` \| float (f tags) \|
221		+-------------------------------------+---------------+--------------------------------------------------------+
222		\| ``integer`` \| ``-12`` \| integer (i tags) \|
223		+-------------------------------------+---------------+--------------------------------------------------------+
224		\| ``optional_integer`` \| ``*`` \| integer or placeholder \|
225		+-------------------------------------+---------------+--------------------------------------------------------+
226		\| ``numeric_array`` \| ``c,10,3`` \| array of integers or floats (B tags) \|
227		+-------------------------------------+---------------+--------------------------------------------------------+
228		\| ``byte_array`` \| ``12F1FF`` \| hexadecimal byte string (H tags) \|
229		+-------------------------------------+---------------+--------------------------------------------------------+
230		\| ``json`` \| ``{’b’:2}`` \| JSON string, no tabs and newlines (J tags) \|
231		+-------------------------------------+---------------+--------------------------------------------------------+
232
233		.. _custom_datatypes:
234
235		Custom datatypes for extensions
236		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
237
238		For custom records, one sometimes needs datatypes not yet available in the GFA
239		specification. For example, a custom datatype can be defined for
240		the taxon identifier used in the ``tid`` field of the T and M records:
241		accordingly the taxon identifier shall be only either
242		in the form ``taxon:<n>``, where ``<n>`` is a positive integer,
243		or consist of letters, numbers and underscores only
244		(without ``:``).
245
246		To define the datatype, a class is written, which contains the following
247		functions:
248
249		* ``validate_encoded(string)``: validates the content of the field,
250		if this is a string (e.g., the name of the T line)
251		* ``validate_decoded(object)``: validates the content of the field,
252		if this is not a string (e.g., a reference to a T line)
253		* ``decode(string)``: validates the content of the field (a string)
254		and returns the decoded content; note that references must not be resolved
255		(there is no access to the Gfa instance here), thus the name of the
256		T line will be returned unchanged
257		* ``encode(string)``: validates the content of the field (not in string
258		form) and returns the string which codes it in the GFA file (also here
259		references are validated but not converted into strings)
260
261		Finally the datatype is registered calling
262		:func:`~gfapy.field.field.Field.register_datatype`. The code for
263		the taxon ID extension is the following:
264
265		.. code:: python
266
267		import re
268
269		class TaxonID:
270
271		def validate_encoded(string):
272		if not re.match(r"^taxon:(\d+)$",string) and \
273		not re.match(r"^[a-zA-Z0-9_]+$", string):
274		raise gfapy.ValueError("Invalid taxon ID: {}".format(string))
275
276		def decode(string):
277		TaxonID.validate_encoded(string)
278		return string
279
280		def validate_decoded(obj):
281		if isinstance(obj,Taxon):
282		TaxonID.validate_encoded(obj.name)
283		else:
284		raise gfapy.TypeError(
285		"Invalid type for taxon ID: "+"{}".format(repr(obj)))
286
287		def encode(obj):
288		TaxonID.validate_decoded(obj)
289		return obj
290
291		gfapy.Field.register_datatype("taxon_id", TaxonID)
292
293		To use the new datatype in the T and M lines defined above (:ref:`extensions`),
294		the definition of the two subclasses can be changed:
295		in ``POSFIELDS`` the value ``taxon_id`` shall be assigned to the key ``tid``.

+0

-43

~~doc/tutorial/errors.rst~~ less more

0		.. _errors:
1
2		Errors
3		------
4
5		The different types of errors defined in Gfapy are summarized in the
6		following table. All exception raised in the library are subclasses of
7		`Error`. Thus, ``except gfapy.Error`` can be use to catch
8		all library errors.
9
10		+-----------------------+-------------------------------+---------------------------------+
11		\| Error \| Description \| Examples \|
12		+=======================+===============================+=================================+
13		\| `VersionError` \| An unknown or wrong version \| "GFA0"; or GFA1 in GFA2 context \|
14		\| \| is specified or implied \| \|
15		+-----------------------+-------------------------------+---------------------------------+
16		\| `ValueError` \| The value of an object is \| a negative position is used \|
17		\| \| invalid \| \|
18		+-----------------------+-------------------------------+---------------------------------+
19		\| `TypeError` \| The wrong type has been used \| Z instead of i used for VN tag; \|
20		\| \| or specified \| Hash for an i tag \|
21		+-----------------------+-------------------------------+---------------------------------+
22		\| `FormatError` \| The format of an object is \| a line does not contain the \|
23		\| \| wrong \| expected number of fields \|
24		+-----------------------+-------------------------------+---------------------------------+
25		\| `NotUniqueError` \| Something should be unique \| duplicated tag name or line \|
26		\| \| but is not \| identifier \|
27		+-----------------------+-------------------------------+---------------------------------+
28		\| `InconsistencyError` \| Pieces of information collide \| length of sequence and LN tag \|
29		\| \| with each other \| do not match \|
30		+-----------------------+-------------------------------+---------------------------------+
31		\| `RuntimeError` \| The user tried to do \| editing from_segment field in \|
32		\| \| something which is not \| connected links \|
33		\| \| allowed \| \|
34		+-----------------------+-------------------------------+---------------------------------+
35		\| `ArgumentError` \| Problem with the arguments of \| wrong number of arguments in \|
36		\| \| a method \| dynamically created method \|
37		+-----------------------+-------------------------------+---------------------------------+
38		\| `AssertionError` \| Something unexpected happened \| there is a bug in the library or\|
39		\| \| \| the library has been used in \|
40		\| \| \| an unintended way \|
41		+-----------------------+-------------------------------+---------------------------------+
42

+0

-409

~~doc/tutorial/gfa.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		gfa = gfapy.Gfa()
4		gfa1 = gfapy.Gfa()
5		gfa1.add_line("H\tVN:Z:1.0")
6		gfa1.add_line("# this is a comment")
7		gfa1.add_line("S\t1\t*")
8		gfa1.add_line("S\t2\t*")
9		gfa1.add_line("S\t3\t*")
10		gfa2 = gfapy.Gfa()
11		gfa2.add_line("H\tVN:Z:2.0\tTS:i:100")
12		gfa2.add_line("X\tcustom line")
13		gfa2.add_line("Y\tcustom line")
14
15		.. _gfa:
16
17		The Gfa class
18		-------------
19
20		The content of a GFA file is represented in Gfapy by an instance of the class
21		:class:`~gfapy.gfa.Gfa`. In most cases, the Gfa instance will be constructed
22		from the data contained in a GFA file, using the method
23		:func:`Gfa.from_file() <gfapy.gfa.Gfa.from_file>`.
24
25		Alternatively, it is possible to use the construct of the class; it takes an
26		optional positional parameter, the content of a GFA file (as string, or as list
27		of strings, one per line of the GFA file). If no GFA content is provided, the
28		Gfa instance will be empty.
29
30		.. doctest::
31
32		>>> gfa = gfapy.Gfa("H\tVN:Z:1.0\nS\tA\t*")
33		>>> print(len(gfa.lines))
34		2
35		>>> gfa = gfapy.Gfa(["H\tVN:Z:1.0", "S\tA\t", "S\tB\t"])
36		>>> print(len(gfa.lines))
37		3
38		>>> gfa = gfapy.Gfa()
39		>>> print(len(gfa.lines))
40		0
41
42		The string representation of the Gfa object (which can be obtained using
43		``str()``) is the textual representation in GFA format.
44		Using :func:`Gfa.to_file(filename) <gfapy.gfa.Gfa.to_file>` allows
45		writing this representation to a GFA file (the content of the file is
46		overwritten).
47
48		.. doctest::
49
50		>>> g1 = gfapy.Gfa()
51		>>> g1.append("H\tVN:Z:1.0")
52		>>> g1.append("S\ta\t*")
53		>>> g1.to_file("my.gfa") #doctest: +SKIP
54		>>> g2 = gfapy.Gfa.from_file("my.gfa") #doctest: +SKIP
55		>>> str(g1)
56		'H\tVN:Z:1.0\nS\ta\t*'
57
58
59		All methods for creating a Gfa (constructor and from_file) accept
60		a ``vlevel`` parameter, the validation level,
61		and can assume the values 0, 1, 2 and 3. A higher value means
62		more validations are performed. The :ref:`validation` chapter explains
63		the meaning of the different validation levels in detail.
64		The default value is 1.
65
66		.. doctest::
67
68		>>> gfapy.Gfa().vlevel
69		1
70		>>> gfapy.Gfa(vlevel = 0).vlevel
71		0
72
73		A further parameter is ``version``. It can be set to ``'gfa1'``,
74		``'gfa2'`` or left to the default value (``None``). The default
75		is to auto-detect the version of the GFA from the line content.
76		If the version is set manually, any content not compatible to the
77		specified version will trigger an exception. If the version is
78		set automatically, an exception will be raised if two lines
79		are found, with content incompatible to each other (e.g. a GFA1
80		segment followed by a GFA2 segment).
81
82		.. doctest::
83
84		>>> g = gfapy.Gfa(version='gfa2')
85		>>> g.version
86		'gfa2'
87		>>> g.add_line("S\t1\t*")
88		Traceback (most recent call last):
89		...
90		gfapy.error.VersionError: Version: 1.0 (None)
91		...
92		>>> g = gfapy.Gfa()
93		>>> g.version
94		>>> g.add_line("S\t1\t*")
95		>>> g.version
96		'gfa1'
97		>>> g.add_line("S\t1\t100\t*")
98		Traceback (most recent call last):
99		...
100		gfapy.error.VersionError: Version: 1.0 (None)
101		...
102
103		Collections of lines
104		~~~~~~~~~~~~~~~~~~~~
105
106		The property :attr:`~gfapy.lines.collections.Collections.lines`
107		of the Gfa object is a list of all the lines
108		in the GFA file (including the header, which is split into single-tag
109		lines). The list itself shall not be modified by the user directly (i.e.
110		adding and removing lines is done using a different interface, see
111		below). However the single elements of the list can be edited.
112
113		.. doctest::
114
115		>>> for line in gfa.lines: print(line)
116
117		For most record types, a list of the lines of the record type is available
118		as a read-only property, which is named after the record type, in plural.
119
120		.. doctest::
121
122		>>> [str(line) for line in gfa1.segments]
123		['S\t1\t', 'S\t2\t', 'S\t3\t*']
124		>>> [str(line) for line in gfa2.fragments]
125		[]
126
127		A particular case are edges; these are in GFA1 links and containments, while in
128		GFA2 there is a unified edge record type, which also allows to represent
129		internal alignments. In Gfapy, the
130		:attr:`~gfapy.lines.collections.Collections.edges` property retrieves all edges
131		(i.e. all E lines in GFA2, and all L and C lines in GFA1). The
132		:attr:`~gfapy.lines.collections.Collections.dovetails` property is a list of
133		all edges which represent dovetail overlaps (i.e. all L lines in GFA1 and a
134		subset of the E lines in GFA2). The
135		:attr:`~gfapy.lines.collections.Collections.containments` property is a list of
136		all edges which represent containments (i.e. all C lines in GFA1 and a subset
137		of the E lines in GFA2).
138
139		.. doctest::
140
141		>>> gfa2.edges
142		[]
143		>>> gfa2.dovetails
144		[]
145		>>> gfa2.containments
146		[]
147
148		Paths are retrieved using the
149		:attr:`~gfapy.lines.collections.Collections.paths` property. This list
150		contains all P lines in GFA1 and all O lines in GFA2. Sets returns the list of
151		all U lines in GFA2 (empty list in GFA1).
152
153		.. doctest::
154
155		>>> gfa2.paths
156		[]
157		>>> gfa2.sets
158		[]
159
160		The header contain metadata in a single or multiple lines. For ease of
161		access to the header information, all its tags are summarized in a
162		single line instance, which is retrieved using the
163		:attr:`~gfapy.lines.headers.Headers.header` property. This list
164		The :ref:`header` chapter of this manual explains more in
165		detail, how to work with the header object.
166
167		.. doctest::
168
169		>>> gfa2.header.TS
170		100
171
172		All lines which start by the string ``#`` are comments; they are handled in
173		the :ref:`comments` chapter and are retrieved using the
174		:attr:`~gfapy.lines.collections.Collections.comments` property.
175
176		.. doctest::
177
178		>>> [str(line) for line in gfa1.comments]
179		['# this is a comment']
180
181		Custom lines are lines of GFA2 files which start
182		with a non-standard record type. Gfapy provides basic built-in support
183		for accessing the information in custom lines, and allows to define
184		extensions for own record types for defining more advanced
185		functionality (see the :ref:`custom_records` chapter).
186
187		.. doctest::
188
189		>>> [str(line) for line in gfa2.custom_records]
190		['X\tcustom line', 'Y\tcustom line']
191		>>> gfa2.custom_record_keys
192		['X', 'Y']
193		>>> [str(line) for line in gfa2.custom_records_of_type('X')]
194		['X\tcustom line']
195
196		Line identifiers
197		~~~~~~~~~~~~~~~~
198
199		Some GFA lines have a mandatory or optional identifier field: segments and
200		paths in GFA1, segments, gaps, edges, paths and sets in GFA2. A line of this
201		type can be retrieved by identifier, using the method
202		:func:`Gfa.line(ID) <gfapy.gfa.Gfa.line>` using the identifier as argument.
203
204		.. doctest::
205
206		>>> str(gfa1.line('1'))
207		'S\t1\t*'
208
209		The GFA2 specification prescribes the exact namespace for the identifier
210		(segments, paths, sets, edges and gaps identifier share the same namespace).
211		The content of this namespace can be retrieved using the
212		:attr:`~gfapy.lines.collections.Collections.names` property.
213		The identifiers of single line types
214		can be retrieved using the properties
215		:attr:`~gfapy.lines.collections.Collections.segment_names`,
216		:attr:`~gfapy.lines.collections.Collections.edge_names`,
217		:attr:`~gfapy.lines.collections.Collections.gap_names`,
218		:attr:`~gfapy.lines.collections.Collections.path_names` and
219		:attr:`~gfapy.lines.collections.Collections.set_names`.
220
221		.. doctest::
222
223		>>> g = gfapy.Gfa()
224		>>> g.add_line("S\tA\t100\t*")
225		>>> g.add_line("S\tB\t100\t*")
226		>>> g.add_line("S\tC\t100\t*")
227		>>> g.add_line("E\tb_c\tB+\tC+\t0\t10\t90\t100$\t*")
228		>>> g.add_line("O\tp1\tB+ C+")
229		>>> g.add_line("U\ts1\tA b_c g")
230		>>> g.add_line("G\tg\tA+\tB-\t1000\t*")
231		>>> g.names
232		['A', 'B', 'C', 'b_c', 'g', 'p1', 's1']
233		>>> g.segment_names
234		['A', 'B', 'C']
235		>>> g.path_names
236		['p1']
237		>>> g.edge_names
238		['b_c']
239		>>> g.gap_names
240		['g']
241		>>> g.set_names
242		['s1']
243
244		The GFA1 specification does not handle the question of the namespace of
245		identifiers explicitly. However, gfapy assumes and enforces
246		a single namespace for segment, path names and the values of the ID tags
247		of L and C lines. The content of this namespace can be found using
248		:attr:`~gfapy.lines.collections.Collections.names` property.
249		The identifiers of single line types
250		can be retrieved using the properties
251		:attr:`~gfapy.lines.collections.Collections.segment_names`,
252		:attr:`~gfapy.lines.collections.Collections.edge_names`
253		(ID tags of links and containments) and
254		:attr:`~gfapy.lines.collections.Collections.path_names`.
255		For GFA1, the properties
256		:attr:`~gfapy.lines.collections.Collections.gap_names`,
257		:attr:`~gfapy.lines.collections.Collections.set_names`
258		contain always empty lists.
259
260		.. doctest::
261
262		>>> g = gfapy.Gfa()
263		>>> g.add_line("S\tA\t*")
264		>>> g.add_line("S\tB\t*")
265		>>> g.add_line("S\tC\t*")
266		>>> g.add_line("L\tB\t+\tC\t+\t*\tID:Z:b_c")
267		>>> g.add_line("P\tp1\tB+,C+\t*")
268		>>> g.names
269		['A', 'B', 'C', 'b_c', 'p1']
270		>>> g.segment_names
271		['A', 'B', 'C']
272		>>> g.path_names
273		['p1']
274		>>> g.edge_names
275		['b_c']
276		>>> g.gap_names
277		[]
278		>>> g.set_names
279		[]
280
281		Identifiers of external sequences
282		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
283
284		Fragments contain identifiers which refer to external sequences
285		(not contained in the GFA file). According to the specification, the
286		these identifiers are not part of the same namespace as the identifier
287		of the GFA lines. They can be retrieved using the
288		:attr:`~gfapy.lines.collections.Collections.external_names`
289		property.
290
291		.. doctest::
292
293		>>> g = gfapy.Gfa()
294		>>> g.add_line("S\tA\t100\t*")
295		>>> g.add_line("F\tA\tread1+\t10\t30\t0\t20$\t20M")
296		>>> g.external_names
297		['read1']
298
299		The method
300		:func:`Gfa.fragments_for_external(external_ID) <gfapy.lines.finders.Finders.fragments_for_external>`
301		retrieves all F lines with a specified external sequence identifier.
302
303		.. doctest::
304
305		>>> f = g.fragments_for_external('read1')
306		>>> len(f)
307		1
308		>>> str(f[0])
309		'F\tA\tread1+\t10\t30\t0\t20$\t20M'
310
311		Adding new lines
312		~~~~~~~~~~~~~~~~
313
314		New lines can be added to a Gfa instance using the
315		:func:`Gfa.add_line(line) <gfapy.lines.creators.Creators.add_line>`
316		method or its alias
317		:func:`Gfa.append(line) <gfapy.lines.creators.Creators.append>`.
318		The argument can be either a string
319		describing a line with valid GFA syntax, or a :class:`~gfapy.line.line.Line`
320		instance. If a string is added, a line instance is created and
321		then added.
322
323		.. doctest::
324
325		>>> g = gfapy.Gfa()
326		>>> g.add_line("S\tA\t*") #doctest: +ELLIPSIS
327		>>> g.segment_names
328		['A']
329		>>> g.append("S\tB\t*") #doctest: +ELLIPSIS
330		>>> g.segment_names
331		['A', 'B']
332
333		Editing the lines
334		~~~~~~~~~~~~~~~~~
335
336		Accessing the information stored in the fields of a line instance is
337		described in the :ref:`positional_fields` and :ref:`tags` chapters.
338
339		In Gfapy, a line instance belonging to a Gfa instance is said
340		to be connected to the Gfa instance. Direct editing the content of a connected
341		line is only possible, for those fields which do not contain
342		references to other lines. For more information on how to modify the content of
343		the fields of connected line, see the :ref:`references` chapter.
344
345		.. doctest::
346
347		>>> g = gfapy.Gfa()
348		>>> e = gfapy.Line("E\t\tA+\tB-\t0\t10\t90\t100$\t")
349		>>> e.sid1 = "C+"
350		>>> g.add_line(e) #doctest: +ELLIPSIS
351		>>> e.sid1 = "A+"
352		Traceback (most recent call last):
353		gfapy.error.RuntimeError: ...
354
355		Removing lines
356		~~~~~~~~~~~~~~
357
358		Disconnecting a line from the Gfa instance is done using the
359		:func:`Gfa.rm(line) <gfapy.lines.destructors.Destructors.rm>` method. The
360		argument can be a line instance or the name of a line.
361
362		In alternative, a line instance can also be disconnected using the
363		`disconnect` method on it. Disconnecting a line
364		may trigger other operations, such as the disconnection of other lines (see the
365		:ref:`references` chapter).
366
367		.. doctest::
368
369		>>> g = gfapy.Gfa()
370		>>> g.add_line("S\tA\t*") #doctest: +ELLIPSIS
371		>>> g.segment_names
372		['A']
373		>>> g.rm('A') #doctest: +ELLIPSIS
374		>>> g.segment_names
375		[]
376		>>> g.append("S\tB\t*") #doctest: +ELLIPSIS
377		>>> g.segment_names
378		['B']
379		>>> b = g.line('B')
380		>>> b.disconnect()
381		>>> g.segment_names
382		[]
383
384		Renaming lines
385		~~~~~~~~~~~~~~
386
387		Lines with an identifier can be renamed. This is done simply by editing
388		the corresponding field (such as ``name`` or ``sid`` for a segment).
389		This field is not a reference to another line and can be freely edited
390		also in line instances connected to a Gfa. All references to the line
391		from other lines will still be up to date, as they will refer to the
392		same instance (whose name has been changed) and their string
393		representation will use the new name.
394
395		.. doctest::
396
397		>>> g = gfapy.Gfa()
398		>>> g.add_line("S\tA\t*") #doctest: +ELLIPSIS
399		>>> g.add_line("L\tA\t+\tB\t-\t*") #doctest: +ELLIPSIS
400		>>> g.segment_names
401		['A', 'B']
402		>>> g.dovetails[0].from_name
403		'A'
404		>>> g.segment('A').name = 'C'
405		>>> g.segment_names
406		['B', 'C']
407		>>> g.dovetails[0].from_name
408		'C'

+0

-14

~~doc/tutorial/graph_operations.rst~~ less more

0		.. _graph_operations:
1
2		Graph operations
3		----------------
4
5		Graph operations such as linear paths merging, multiplication of
6		segments and other are provided. These operations are implemented
7		in analogy to those provided by the Ruby library RGFA. As RGFA only
8		handles GFA1 graphs, only dovetail overlaps are considered as
9		connections. A detailed description of the operation can be
10		found in Gonnella and Kurtz (2016). More information about the
11		single operations are found in the method documentation of the
12		submodules of `GraphOperations`.
13

+0

-189

~~doc/tutorial/header.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		gfa = gfapy.Gfa()
4
5		.. _header:
6
7		The Header
8		----------
9
10		GFA files may contain one or multiple header lines (record type: "H"). These
11		lines may be present in any part of the file, not necessarily at the beginning.
12
13		Although the header may consist of multiple lines, its content refers to the
14		whole file. Therefore in Gfapy the header is accessed using a single line
15		instance (accessible by the :attr:`~gfapy.lines.headers.Headers.header`
16		property). Header lines contain only tags. If not header line is present in the
17		Gfa, then the header line object will be empty (i.e. contain no tags).
18
19		Note that header lines cannot be connected to the Gfa as other lines (i.e.
20		calling :meth:`~gfapy.line.common.connection.Connection.connect` on them raises
21		an exception). Instead they must be merged to the existing Gfa header, using
22		`add_line` on the Gfa instance.
23
24		.. doctest::
25
26		>>> gfa.add_line("H\tnn:f:1.0") #doctest: +ELLIPSIS
27		>>> gfa.header.nn
28		1.0
29		>>> gfapy.Line("H\tnn:f:1.0").connect(gfa)
30		Traceback (most recent call last):
31		...
32		gfapy.error.RuntimeError: ...
33
34		Multiple definitions of the predefined header tags
35		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
36
37		For the predefined tags (``VN`` and ``TS``), the presence of multiple
38		values in different lines is an error, unless the value is the same in
39		each instance (in which case the repeated definitions are ignored).
40
41		.. doctest::
42
43		>>> gfa.add_line("H\tVN:Z:1.0") #doctest: +ELLIPSIS
44		>>> gfa.add_line("H\tVN:Z:1.0") # ignored #doctest: +ELLIPSIS
45		>>> gfa.add_line("H\tVN:Z:2.0")
46		Traceback (most recent call last):
47		...
48		gfapy.error.VersionError: ...
49
50		Multiple definitions of custom header tags
51		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
52
53		If the tags are present only once in the header in its entirety, the access to
54		the tags is the same as for any other line (see the :ref:`tags` chapter).
55
56		However, the specification does not forbid custom tags to be defined with
57		different values in different header lines (which we name "multi-definition
58		tags"). This particular case is handled in the next sections.
59
60		Reading multi-definitions tags
61		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
62
63		Reading, validating and setting the datatype of multi-definition tags is done
64		using the same methods as for all other lines (see the :ref:`tags` chapter).
65		However, if a tag is defined multiple times on multiple H lines, reading the
66		tag will return a list of the values on the lines. This array is an instance of
67		the subclass ``gfapy.FieldArray`` of list.
68
69		.. doctest::
70
71		>>> gfa.add_line("H\txx:i:1") #doctest: +ELLIPSIS
72		>>> gfa.add_line("H\txx:i:2") #doctest: +ELLIPSIS
73		>>> gfa.add_line("H\txx:i:3") #doctest: +ELLIPSIS
74		>>> gfa.header.xx
75		gfapy.FieldArray('i',[1, 2, 3])
76
77		Setting tags
78		~~~~~~~~~~~~
79
80		There are two possibilities to set a tag for the header. The first is
81		the normal tag interface (using ``set`` or the tag name property). The
82		second is to use ``add``. The latter supports multi-definition tags,
83		i.e. it adds the value to the previous ones (if any), instead of
84		overwriting them.
85
86		.. doctest::
87
88		>>> gfa = gfapy.Gfa()
89		>>> gfa.header.xx
90		>>> gfa.header.add("xx", 1)
91		>>> gfa.header.xx
92		1
93		>>> gfa.header.add("xx", 2)
94		>>> gfa.header.xx
95		gfapy.FieldArray('i',[1, 2])
96		>>> gfa.header.set("xx", 3)
97		>>> gfa.header.xx
98		3
99
100		Modifying field array values
101		~~~~~~~~~~~~~~~~~~~~~~~~~~~~
102
103		Field arrays can be modified directly (e.g. adding new values or
104		removing some values). After modification, the user may check if the
105		array values remain compatible with the datatype of the tag using the
106		:meth:`~gfapy.line.common.validate.Validate.validate_field`` method.
107
108		.. doctest::
109
110		>>> gfa = gfapy.Gfa()
111		>>> gfa.header.xx = gfapy.FieldArray('i',[1,2,3])
112		>>> gfa.header.xx
113		gfapy.FieldArray('i',[1, 2, 3])
114		>>> gfa.header.validate_field("xx")
115		>>> gfa.header.xx.append("X")
116		>>> gfa.header.validate_field("xx")
117		Traceback (most recent call last):
118		...
119		gfapy.error.FormatError: ...
120
121		If the field array is modified using array methods which return a list
122		or data of any other type, a field array must be constructed, setting
123		its datatype to the value returned by calling
124		:meth:`~gfapy.line.common.field_datatype.FieldDatatype.get_datatype`
125		on the header.
126
127		.. doctest::
128
129		>>> gfa = gfapy.Gfa()
130		>>> gfa.header.xx = gfapy.FieldArray('i',[1,2,3])
131		>>> gfa.header.xx
132		gfapy.FieldArray('i',[1, 2, 3])
133		>>> gfa.header.xx = gfapy.FieldArray(gfa.header.get_datatype("xx"),
134		... list(map(lambda x: x+1, gfa.header.xx)))
135		>>> gfa.header.xx
136		gfapy.FieldArray('i',[2, 3, 4])
137
138		String representation of the header
139		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
140
141		For consistency with other line types, the string representation of the header
142		is a single-line string, eventually non standard-compliant, if it contains
143		multiple instances of the tag. (and when calling
144		:meth:`~gfapy.line.common.writer.Writer.field_to_s` for a tag present multiple
145		times, the output string will contain the instances of the tag, separated by
146		tabs).
147
148		However, when the Gfa is output to file or string, the header is split into
149		multiple H lines with single tags, so that standard-compliant GFA is output.
150		The split header can be retrieved using the
151		:attr:`~gfapy.lines.headers.Headers.headers` property of the Gfa instance.
152
153		.. doctest::
154
155		>>> gfa = gfapy.Gfa()
156		>>> gfa.header.VN = "1.0"
157		>>> gfa.header.xx = gfapy.FieldArray('i',[1,2])
158		>>> gfa.header.field_to_s("xx")
159		'1\t2'
160		>>> gfa.header.field_to_s("xx", tag=True)
161		'xx:i:1\txx:i:2'
162		>>> str(gfa.header)
163		'H\tVN:Z:1.0\txx:i:1\txx:i:2'
164		>>> [str(h) for h in gfa.headers]
165		['H\tVN:Z:1.0', 'H\txx:i:1', 'H\txx:i:2']
166		>>> str(gfa)
167		'H\tVN:Z:1.0\nH\txx:i:1\nH\txx:i:2'
168
169		Count the input header lines
170		~~~~~~~~~~~~~~~~~~~~~~~~~~~~
171
172		Due to the different way header lines are stored, the number of header elements
173		is not equal to the number of header lines in the input. This is annoying if an
174		application wants to count the number of input lines in a file. In order to make
175		that possible, the number of input header lines are counted and can be
176		retrieved using the :attr:`~gfapy.lines.headers.Headers.n_input_header_lines`
177		property of the Gfa instance.
178
179		.. doctest::
180
181		>>> gfa = gfapy.Gfa()
182		>>> gfa.add_line("H\txx:i:1\tyy:Z:ABC") #doctest: +ELLIPSIS
183		>>> gfa.add_line("H\txy:i:2") #doctest: +ELLIPSIS
184		>>> gfa.add_line("H\tyz:i:3\tab:A:A") #doctest: +ELLIPSIS
185		>>> len(gfa.headers)
186		5
187		>>> gfa.n_input_header_lines
188		3

+0

-69

~~doc/tutorial/placeholders.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3
4		.. _placeholders:
5
6		Placeholders
7		------------
8
9		Some positional fields may contain an undefined value S: ``sequence``;
10		L/C: ``overlap``; P: ``overlaps``; E: ``eid``, ``alignment``; F:
11		``alignment``; G: ``gid``, ``var``; U/O: ``pid``. In GFA this value is
12		represented by a ``*``.
13
14		In Gfapy the class `Placeholder` represent the undefined value.
15
16		Distinguishing placeholders
17		~~~~~~~~~~~~~~~~~~~~~~~~~~~
18
19		The :func:`gfapy.is_placeholder() <gfapy.placeholder.is_placeholder>` method
20		allows to check if a value is a placeholder; a value is a placeholder if
21		it is a `Placeholder` instance, or would represent
22		a placeholder in GFA (a string containing ``*``), or would be represented
23		by a placeholder in GFA (e.g. an empty array).
24
25		.. doctest::
26
27		>>> gfapy.is_placeholder("*")
28		True
29		>>> gfapy.is_placeholder("**")
30		False
31		>>> gfapy.is_placeholder([])
32		True
33		>>> gfapy.is_placeholder(gfapy.Placeholder())
34		True
35
36		Note that, as a placeholder is ``False`` in boolean context, just a
37		``if not placeholder`` will also work, if the value is an instance
38		of `Placeholder`, but not always for the other cases (in particular not
39		for the string representation ``*``).
40		Therefore using
41		:func:`gfapy.is_placeholder() <gfapy.placeholder.is_placeholder>`
42		is better.
43
44		.. doctest::
45
46		>>> if "": print(' is not a placeholder')
47		* is not a placeholder
48		>>> if gfapy.is_placeholder("*"): print('but it represents a placeholder')
49		but it represents a placeholder
50
51		Compatibility methods
52		~~~~~~~~~~~~~~~~~~~~~
53
54		Some methods are defined for placeholders, which allow them to respond
55		to the same methods as defined values. This allows to write generic
56		code.
57
58		.. doctest::
59
60		>>> placeholder = gfapy.Placeholder()
61		>>> placeholder.validate() # does nothing
62		>>> len(placeholder)
63		0
64		>>> placeholder[1]
65		gfapy.Placeholder()
66		>>> placeholder + 1
67		gfapy.Placeholder()
68

+0

-449

~~doc/tutorial/positional_fields.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		gfa = gfapy.Gfa()
4
5		.. _positional_fields:
6
7		Positional fields
8		-----------------
9
10		Most lines in GFA have positional fields (Headers are an exception).
11		During parsing, if a line is encountered, which has too less or too many
12		positional fields, an exception will be thrown. The correct number of
13		positional fields is record type-specific.
14
15		Positional fields are recognized by its position in the line. Each
16		positional field has an implicit field name and datatype associated with
17		it.
18
19		Field names
20		~~~~~~~~~~~
21
22		The field names are derived from the specification. Lower case versions
23		of the field names are used and spaces are substituted with underscores.
24		In some cases, the field names were changed, as they represent keywords
25		in common programming languages or clash with potential tag names
26		(``from``, ``to``, ``send``).
27
28		The following tables shows the field names used in Gfapy, for each kind
29		of line. Headers have no positional fields. Comments and custom records
30		follow particular rules, see the respective chapters (:ref:`comments` and
31		:ref:`custom_records`).
32
33		GFA1 field names
34		^^^^^^^^^^^^^^^^
35
36		+---------------+--------------------+---------------------+------------------+-----------------+---------------+---------------+
37		\| Record Type \| Field 1 \| Field 2 \| Field 3 \| Field 4 \| Field 5 \| Field 6 \|
38		+===============+====================+=====================+==================+=================+===============+===============+
39		\| Segment \| ``name`` \| ``sequence`` \| \| \| \| \|
40		+---------------+--------------------+---------------------+------------------+-----------------+---------------+---------------+
41		\| Link \| ``from_segment`` \| ``from_orient`` \| ``to_segment`` \| ``to_orient`` \| ``overlap`` \| \|
42		+---------------+--------------------+---------------------+------------------+-----------------+---------------+---------------+
43		\| Containment \| ``from_segment`` \| ``from_orient`` \| ``to_segment`` \| ``to_orient`` \| ``pos`` \| ``overlap`` \|
44		+---------------+--------------------+---------------------+------------------+-----------------+---------------+---------------+
45		\| Path \| ``path_name`` \| ``segment_names`` \| ``overlaps`` \| \| \| \|
46		+---------------+--------------------+---------------------+------------------+-----------------+---------------+---------------+
47
48		GFA2 field names
49		^^^^^^^^^^^^^^^^
50
51		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
52		\| Record Type \| Field 1 \| Field 2 \| Field 3 \| Field 4 \| Field 5 \| Field 6 \| Field 7 \| Field 8 \|
53		+===============+===========+================+================+=============+=============+=============+=================+=================+
54		\| Segment \| ``sid`` \| ``slen`` \| ``sequence`` \| \| \| \| \| \|
55		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
56		\| Edge \| ``eid`` \| ``sid1`` \| ``sid2`` \| ``beg1`` \| ``end1`` \| ``beg2`` \| ``end2`` \| ``alignment`` \|
57		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
58		\| Fragment \| ``sid`` \| ``external`` \| ``s_beg`` \| ``s_end`` \| ``f_beg`` \| ``f_end`` \| ``alignment`` \| \|
59		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
60		\| Gap \| ``gid`` \| ``sid1`` \| ``d1`` \| ``d2`` \| ``sid2`` \| ``disp`` \| ``var`` \| \|
61		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
62		\| Set \| ``pid`` \| ``items`` \| \| \| \| \| \| \|
63		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
64		\| Path \| ``pid`` \| ``items`` \| \| \| \| \| \| \|
65		+---------------+-----------+----------------+----------------+-------------+-------------+-------------+-----------------+-----------------+
66
67		Datatypes
68		~~~~~~~~~
69
70		The datatype of each positional field is described in the specification
71		and cannot be changed (differently from tags). Here is a short
72		description of the Python classes used to represent data for different
73		datatypes.
74
75		Placeholders
76		^^^^^^^^^^^^
77
78		The positional fields in GFA can never be empty. However, there are some
79		fields with optional values. If a value is not specified, a placeholder
80		character is used instead (``*``). Such undefined values are represented
81		in Gfapy by the `Placeholder` class, which is described more in
82		detail in the :ref:`placeholders` chapter.
83
84		Arrays
85		^^^^^^
86
87		The ``items`` field in unordered and ordered groups and the
88		``segment_names`` and ``overlaps`` fields in paths are lists of objects
89		and are represented by list instances.
90
91		.. doctest::
92
93		>>> set = gfapy.Line("U\t*\t1 A 2")
94		>>> type(set.items)
95		<class 'list'>
96		>>> gfa2_path = gfapy.Line("O\t*\tA+ B-")
97		>>> type(gfa2_path.items)
98		<class 'list'>
99		>>> gfa1_path = gfapy.Line("P\tp1\tA+,B-\t10M,9M1D1M")
100		>>> type(gfa1_path.segment_names)
101		<class 'list'>
102		>>> type(gfa1_path.overlaps)
103		<class 'list'>
104
105		Orientations
106		^^^^^^^^^^^^
107
108		Orientations are represented by strings. The ``gfapy.invert()`` method
109		applied to an orientation string returns the other orientation.
110
111		.. doctest::
112
113		>>> gfapy.invert("+")
114		'-'
115		>>> gfapy.invert("-")
116		'+'
117
118		Identifiers
119		^^^^^^^^^^^
120
121		The identifier of the line itself (available for S, P, E, G, U, O lines)
122		can always be accessed in Gfapy using the ``name`` alias and is
123		represented in Gfapy by a string. If it is optional (E, G, U, O lines)
124		and not specified, it is represented by a Placeholder instance. The
125		fragment identifier is also a string.
126
127		Identifiers which refer to other lines are also present in some line
128		types (L, C, E, G, U, O, F). These are never placeholders and in
129		stand-alone lines are represented by strings. In connected lines they
130		are references to the Line instances to which they refer to (see the
131		:ref:`references` chapter).
132
133		Oriented identifiers
134		^^^^^^^^^^^^^^^^^^^^
135
136		Oriented identifiers (e.g. ``segment_names`` in GFA1 paths) are
137		represented by elements of the class ``gfapy.OrientedLine``. The
138		``segment`` method of the oriented segments returns the segment
139		identifier (or segment reference in connected path lines) and the
140		``orient`` method returns the orientation string. The ``name`` method
141		returns the string of the segment, even if this is a reference to a
142		segment. A new oriented line can be created using the
143		``OL[line, orientation]`` method.
144
145		Calling ``invert`` returns an oriented segment, with inverted
146		orientation. To set the two attributes the methods ``segment=`` and
147		``orient=`` are available.
148
149		Examples:
150
151		.. doctest::
152
153		>>> p = gfapy.Line("P\tP1\ta+,b-\t*")
154		>>> p.segment_names
155		[gfapy.OrientedLine('a','+'), gfapy.OrientedLine('b','-')]
156		>>> sn0 = p.segment_names[0]
157		>>> sn0.line
158		'a'
159		>>> sn0.name
160		'a'
161		>>> sn0.orient
162		'+'
163		>>> sn0.invert()
164		>>> sn0
165		gfapy.OrientedLine('a','-')
166		>>> sn0.orient
167		'-'
168		>>> sn0.line = gfapy.Line('S\tX\t*')
169		>>> str(sn0)
170		'X-'
171		>>> sn0.name
172		'X'
173		>>> sn0 = gfapy.OrientedLine(gfapy.Line('S\tY\t*'), '+')
174
175		Sequences
176		^^^^^^^^^
177
178		Sequences (S field sequence) are represented by strings in Gfapy.
179		Depending on the GFA version, the alphabet definition is more or less
180		restrictive. The definitions are correctly applied by the validation
181		methods.
182
183		The method ``rc()`` is provided to compute the reverse complement of a
184		nucleotidic sequence. The extended IUPAC alphabet is understood by the
185		method. Applied to non nucleotidic sequences, the results will be
186		meaningless:
187
188		.. doctest::
189
190		>>> from gfapy.sequence import rc
191		>>> rc("gcat")
192		'atgc'
193		>>> rc("*")
194		'*'
195		>>> rc("yatc")
196		'gatr'
197		>>> rc("gCat")
198		'atGc'
199		>>> rc("cag", rna=True)
200		'cug'
201
202		Integers and positions
203		^^^^^^^^^^^^^^^^^^^^^^
204
205		The C lines ``pos`` field and the G lines ``disp`` and ``var`` fields
206		are represented by integers. The ``var`` field is optional, and thus can
207		be also a placeholder. Positions are 0-based coordinates.
208
209		The position fields of GFA2 E lines (``beg1, beg2, end1, end2``) and F
210		lines (``s_beg, s_end, f_beg, f_end``) contain a dollar string as suffix
211		if the position is equal to the segment length. For more information,
212		see the :ref:`positions` chapter.
213
214		Alignments
215		^^^^^^^^^^
216
217		Alignments are always optional, ie they can be placeholders. If they are
218		specified they are CIGAR alignments or, only in GFA2, trace alignments.
219		For more details, see the :ref:`alignments` chapter.
220
221		GFA1 datatypes
222		^^^^^^^^^^^^^^
223
224		+------------------------+---------------+--------------------------------+
225		\| Datatype \| Record Type \| Fields \|
226		+========================+===============+================================+
227		\| Identifier \| Segment \| ``name`` \|
228		+------------------------+---------------+--------------------------------+
229		\| \| Path \| ``path_name`` \|
230		+------------------------+---------------+--------------------------------+
231		\| \| Link \| ``from_segment, to_segment`` \|
232		+------------------------+---------------+--------------------------------+
233		\| \| Containment \| ``from_segment, to_segment`` \|
234		+------------------------+---------------+--------------------------------+
235		\| [OrientedIdentifier] \| Path \| ``segment_names`` \|
236		+------------------------+---------------+--------------------------------+
237		\| Orientation \| Link \| ``from_orient, to_orient`` \|
238		+------------------------+---------------+--------------------------------+
239		\| \| Containment \| ``from_orient, to_orient`` \|
240		+------------------------+---------------+--------------------------------+
241		\| Sequence \| Segment \| ``sequence`` \|
242		+------------------------+---------------+--------------------------------+
243		\| Alignment \| Link \| ``overlap`` \|
244		+------------------------+---------------+--------------------------------+
245		\| \| Containment \| ``overlap`` \|
246		+------------------------+---------------+--------------------------------+
247		\| [Alignment] \| Path \| ``overlaps`` \|
248		+------------------------+---------------+--------------------------------+
249		\| Position \| Containment \| ``pos`` \|
250		+------------------------+---------------+--------------------------------+
251
252		GFA2 datatypes
253		^^^^^^^^^^^^^^
254
255		+------------------------+---------------+----------------------------------+
256		\| Datatype \| Record Type \| Fields \|
257		+========================+===============+==================================+
258		\| Itentifier \| Segment \| ``sid`` \|
259		+------------------------+---------------+----------------------------------+
260		\| \| Fragment \| ``sid`` \|
261		+------------------------+---------------+----------------------------------+
262		\| OrientedIdentifier \| Edge \| ``sid1, sid2`` \|
263		+------------------------+---------------+----------------------------------+
264		\| \| Gap \| ``sid1, sid2`` \|
265		+------------------------+---------------+----------------------------------+
266		\| \| Fragment \| ``external`` \|
267		+------------------------+---------------+----------------------------------+
268		\| OptionalIdentifier \| Edge \| ``eid`` \|
269		+------------------------+---------------+----------------------------------+
270		\| \| Gap \| ``gid`` \|
271		+------------------------+---------------+----------------------------------+
272		\| \| U Group \| ``oid`` \|
273		+------------------------+---------------+----------------------------------+
274		\| \| O Group \| ``uid`` \|
275		+------------------------+---------------+----------------------------------+
276		\| [Identifier] \| U Group \| ``items`` \|
277		+------------------------+---------------+----------------------------------+
278		\| [OrientedIdentifier] \| O Group \| ``items`` \|
279		+------------------------+---------------+----------------------------------+
280		\| Sequence \| Segment \| ``sequence`` \|
281		+------------------------+---------------+----------------------------------+
282		\| Alignment \| Edge \| ``alignment`` \|
283		+------------------------+---------------+----------------------------------+
284		\| \| Fragment \| ``alignment`` \|
285		+------------------------+---------------+----------------------------------+
286		\| Position \| Edge \| ``beg1, end1, beg2, end2`` \|
287		+------------------------+---------------+----------------------------------+
288		\| \| Fragment \| ``s_beg, s_end, f_beg, f_end`` \|
289		+------------------------+---------------+----------------------------------+
290		\| Integer \| Gap \| ``disp, var`` \|
291		+------------------------+---------------+----------------------------------+
292
293		Reading and writing positional fields
294		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
295
296		The ``positional_fieldnames`` method returns the list of the names (as
297		strings) of the positional fields of a line. The positional fields can
298		be read using a method on the Gfapy line object, which is called as the
299		field name. Setting the value is done with an equal sign version of the
300		field name method (e.g. segment.slen = 120). In alternative, the
301		``set(fieldname, value)`` and ``get(fieldname)`` methods can also be
302		used.
303
304		.. doctest::
305
306		>>> s_gfa1 = gfapy.Line("S\t1\t*")
307		>>> s_gfa1.positional_fieldnames
308		['name', 'sequence']
309		>>> s_gfa1.name
310		'1'
311		>>> s_gfa1.get("name")
312		'1'
313		>>> s_gfa1.name = "segment2"
314		>>> s_gfa1.name
315		'segment2'
316		>>> s_gfa1.set('name',"3")
317		>>> s_gfa1.name
318		'3'
319
320		When a field is read, the value is converted into an appropriate object.
321		The string representation of a field can be read using the
322		``field_to_s(fieldname)`` method.
323
324		.. doctest::
325
326		>>> gfa = gfapy.Gfa()
327		>>> gfa.add_line("S\ts1\t*")
328		>>> gfa.add_line("L\ts1\t+\ts2\t-\t*")
329		>>> link = gfa.dovetails[0]
330		>>> str(link.from_segment)
331		'S\ts1\t*'
332		>>> link.field_to_s('from_segment')
333		's1'
334
335		When setting a non-string field, the user can specify the value of a tag
336		either as a Python non-string object, or as the string representation of
337		the value.
338
339		.. doctest::
340
341		>>> gfa = gfapy.Gfa(version='gfa1')
342		>>> gfa.add_line("C\ta\t+\tb\t-\t10\t*")
343		>>> c = gfa.containments[0]
344		>>> c.pos
345		10
346		>>> c.pos = 1
347		>>> c.pos
348		1
349		>>> c.pos = "2"
350		>>> c.pos
351		2
352		>>> c.field_to_s("pos")
353		'2'
354
355		Note that setting the value of reference and backreferences-related
356		fields is generally not allowed, when a line instance is connected to a
357		Gfa object (see the :ref:`references` chapter).
358
359		.. doctest::
360
361		>>> gfa = gfapy.Gfa(version='gfa1')
362		>>> l = gfapy.Line("L\ts1\t+\ts2\t-\t*")
363		>>> l.from_name
364		's1'
365		>>> l.from_segment = "s3"
366		>>> l.from_name
367		's3'
368		>>> gfa.add_line(l)
369		>>> l.from_segment = "s4"
370		Traceback (most recent call last):
371		...
372		gfapy.error.RuntimeError: ...
373
374		Validation
375		~~~~~~~~~~
376
377		The content of all positional fields must be a correctly formatted
378		string according to the rules given in the GFA specifications (or a
379		Python object whose string representation is a correctly formatted
380		string).
381
382		Depending on the validation level, more or less checks are done
383		automatically (see the :ref:`validation` chapter). Not regarding which
384		validation level is selected, the user can trigger a manual validation
385		using the ``validate_field(fieldname)`` method for a single field, or
386		using ``validate``, which does a full validation on the whole line,
387		including all positional fields.
388
389		.. doctest::
390
391		>>> line = gfapy.Line("H\txx:i:1")
392		>>> line.validate_field("xx")
393		>>> line.validate()
394
395		Aliases
396		~~~~~~~
397
398		For some fields, aliases are defined, which can be used in all contexts
399		where the original field name is used (i.e. as parameter of a method,
400		and the same setter and getter methods defined for the original field
401		name are also defined for each alias, see below).
402
403		.. doctest::
404
405		>>> gfa1_path = gfapy.Line("P\tX\t1-,2+,3+\t*")
406		>>> gfa1_path.name == gfa1_path.path_name
407		True
408		>>> edge = gfapy.Line("E\t\tA+\tB-\t0\t10\t90\t100$\t")
409		>>> edge.eid == edge.name
410		True
411		>>> containment = gfapy.Line("C\tA\t+\tB\t-\t10\t*")
412		>>> containment.from_segment == containment.container
413		True
414		>>> segment = gfapy.Line("S\t1\t*")
415		>>> segment.sid == segment.name
416		True
417		>>> segment.sid
418		'1'
419		>>> segment.name = '2'
420		>>> segment.sid
421		'2'
422
423		Name
424		^^^^
425
426		Different record types have an identifier field: segments (name in GFA1,
427		sid in GFA2), paths (path\_name), edge (eid), fragment (sid), gap (gid),
428		groups (pid).
429
430		All these fields are aliased to ``name``. This allows the user for
431		example to set the identifier of a line using the ``name=(value)``
432		method using the same syntax for different record types (segments,
433		edges, paths, fragments, gaps and groups).
434
435		Version-specific field names
436		^^^^^^^^^^^^^^^^^^^^^^^^^^^^
437
438		For segments the GFA1 name and the GFA2 sid are equivalent fields. For
439		this reason an alias ``sid`` is defined for GFA1 segments and ``name``
440		for GFA2 segments.
441
442		Crypical field names
443		^^^^^^^^^^^^^^^^^^^^
444
445		The definition of from and to for containments is somewhat cryptic.
446		Therefore following aliases have been defined for containments:
447		container[\_orient] for from[\_\\|segment\\|orient]; contained[\_orient]
448		for to[\_segment\\|orient].

+0

-75

~~doc/tutorial/positions.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3
4		.. _positions:
5
6		Positions
7		---------
8
9		The only position field in GFA1 is the ``pos`` field in the C lines.
10		This represents the starting position of the contained segment in the
11		container segment and is 0-based.
12
13		Some fields in GFA2 E lines (``beg1, beg2, end1, end2``) and F lines
14		(``s_beg, s_end, f_beg, f_end``) are positions. According to the
15		specification, they are 0-based and represent virtual ticks before and
16		after each string in the sequence. Thus ranges are represented similarly
17		to the Python range conventions: e.g. a 1-character prefix of a sequence
18		will have begin 0 and end 1.
19
20		Last positions in GFA2
21		~~~~~~~~~~~~~~~~~~~~~~
22
23		The GFA2 positions must contain an additional string (``$``) appended to
24		the integer, if (and only if) they are the last position in the segment
25		sequence. These particular positions are represented in Gfapy as
26		instances of the class :class:`~gfapy.lastpos.LastPos`.
27
28		To create a lastpos instance, the constructor can be used with an
29		integer, or the string representation (which must end with the dollar
30		sign, otherwise an integer is returned):
31
32		.. doctest::
33
34		>>> str(gfapy.LastPos(12))
35		'12$'
36		>>> gfapy.LastPos("12")
37		12
38		>>> str(gfapy.LastPos("12"))
39		'12'
40		>>> gfapy.LastPos("12$")
41		gfapy.LastPos(12)
42		>>> str(gfapy.LastPos("12$"))
43		'12$'
44
45		Subtracting an integer from a lastpos returns a lastpos if 0 subtracted,
46		an integer otherwise. This allows to do some arithmetic on positions
47		without making them invalid.
48
49		.. doctest::
50
51		>>> gfapy.LastPos(12) - 0
52		gfapy.LastPos(12)
53		>>> gfapy.LastPos(12) - 1
54		11
55
56		The functions :func:`~gfapy.lastpos.islastpos` and
57		:func:`~gfapy.lastpos.isfirstpos` allow to
58		determine if a position value is 0 (first), or the last position, using
59		the same syntax for lastpos and integer instances.
60
61		.. doctest::
62
63		>>> gfapy.isfirstpos(0)
64		True
65		>>> gfapy.islastpos(0)
66		False
67		>>> gfapy.isfirstpos(12)
68		False
69		>>> gfapy.islastpos(12)
70		False
71		>>> gfapy.islastpos(gfapy.LastPos("12"))
72		False
73		>>> gfapy.islastpos(gfapy.LastPos("12$"))
74		True

+0

-444

~~doc/tutorial/references.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		gfa = gfapy.Gfa()
4
5		.. _references:
6
7		References
8		----------
9
10		Some fields in GFA lines contain identifiers or lists of identifiers
11		(sometimes followed by orientation strings), which reference other lines
12		of the GFA file. In Gfapy it is possible to follow these references and
13		traverse the graph.
14
15		Connecting a line to a Gfa object
16		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17
18		In stand-alone line instances, the identifiers which reference other
19		lines are either strings containing the line name, pairs of strings
20		(name and orientation) in a ``gfapy.OrientedLine`` object, or lists of
21		lines names or ``gfapy.OrientedLine`` objects.
22
23		Using the ``add_line(line)`` (alias: ``append(line)``) method of the
24		``gfapy.Gfa`` object, or the equivalent ``connect(gfa)`` method of the
25		gfapy.Line instance, a line is added to a Gfa instance (this is done
26		automatically when a GFA file is parsed). All strings expressing
27		references are then changed into references to the corresponding line
28		objects. The method ``is_connected()`` allows to determine if a line is
29		connected to a gfapy instance. The read-only property ``gfa`` contains
30		the ``gfapy.Gfa`` instance to which the line is connected.
31
32		.. doctest::
33
34		>>> gfa = gfapy.Gfa(version='gfa1')
35		>>> link = gfapy.Line("L\tA\t-\tB\t+\t20M")
36		>>> link.is_connected()
37		False
38		>>> link.gfa is None
39		True
40		>>> type(link.from_segment)
41		<class 'str'>
42		>>> gfa.append(link)
43		>>> link.is_connected()
44		True
45		>>> link.gfa #doctest: +ELLIPSIS
46		<gfapy.gfa.Gfa object at ...>
47		>>> type(link.from_segment)
48		<class 'gfapy.line.segment.gfa1.GFA1'>
49
50		References for each record type
51		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
52
53		The following tables describes the references contained in each record
54		type. The notation ``[]`` represent lists.
55
56		GFA1
57		^^^^
58
59		+---------------+-------------------+---------------------------+
60		\| Record type \| Fields \| Type of reference \|
61		+===============+===================+===========================+
62		\| Link \| from_segment, to_segment \| Segment \|
63		+---------------+-------------------+---------------------------+
64		\| Containment \| from_segment, to_segment \| Segment \|
65		+---------------+-------------------+---------------------------+
66		\| Path \| segment\_names, \| [OrientedLine(Segment)] \|
67		+---------------+-------------------+---------------------------+
68		\| \| links (1) \| [OrientedLine(Link)] \|
69		+---------------+-------------------+---------------------------+
70
71		(1): paths contain information in the fields segment\_names and
72		overlaps, which allow to find the identify from which they depend; these
73		links can be retrieved using ``links`` (which is not a field).
74
75		GFA2
76		^^^^
77
78		+---------------+--------------+------------------------------------+
79		\| Record type \| Fields \| Type of reference \|
80		+===============+==============+====================================+
81		\| Edge \| sid1, sid2 \| Segment \|
82		+---------------+--------------+------------------------------------+
83		\| Gap \| sid1, sid2 \| Segment \|
84		+---------------+--------------+------------------------------------+
85		\| Fragment \| sid \| Segment \|
86		+---------------+--------------+------------------------------------+
87		\| Set \| items \| [Edge/Set/Path/Segment] \|
88		+---------------+--------------+------------------------------------+
89		\| Path \| items \| [OrientedLine(Edge/Set/Segment)] \|
90		+---------------+--------------+------------------------------------+
91
92		Backreferences for each record type
93		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
94
95		When a line containing a reference to another line is connected to a Gfa
96		object, backreferences to it are created in the targeted line.
97
98		For each backreference collection a read-only property exist, which is
99		named as the collection (e.g. ``dovetails_L`` for segments). Note that
100		the reference list returned by these arrays are read-only and editing
101		the references is done using other methods (see the section "Editing
102		reference fields" below).
103
104		.. code:: python
105
106		segment.dovetails_L # => [gfapy.line.edge.Link(...), ...]
107
108		The following tables describe the backreferences collections for each
109		record type.
110
111		GFA1
112		^^^^
113
114		+---------------+-------------------------+
115		\| Record type \| Backreferences \|
116		+===============+=========================+
117		\| Segment \| dovetails\_L \|
118		+---------------+-------------------------+
119		\| \| dovetails\_R \|
120		+---------------+-------------------------+
121		\| \| edges\_to\_contained \|
122		+---------------+-------------------------+
123		\| \| edges\_to\_containers \|
124		+---------------+-------------------------+
125		\| \| paths \|
126		+---------------+-------------------------+
127		\| Link \| paths \|
128		+---------------+-------------------------+
129
130		GFA2
131		^^^^
132
133		+---------------+-------------------------+--------+
134		\| Record type \| Backreferences \| Type \|
135		+===============+=========================+========+
136		\| Segment \| dovetails\_L \| E \|
137		+---------------+-------------------------+--------+
138		\| \| dovetails\_R \| E \|
139		+---------------+-------------------------+--------+
140		\| \| edges\_to\_contained \| E \|
141		+---------------+-------------------------+--------+
142		\| \| edges\_to\_containers \| E \|
143		+---------------+-------------------------+--------+
144		\| \| internals \| E \|
145		+---------------+-------------------------+--------+
146		\| \| gaps\_L \| G \|
147		+---------------+-------------------------+--------+
148		\| \| gaps\_R \| G \|
149		+---------------+-------------------------+--------+
150		\| \| fragments \| F \|
151		+---------------+-------------------------+--------+
152		\| \| paths \| O \|
153		+---------------+-------------------------+--------+
154		\| \| sets \| U \|
155		+---------------+-------------------------+--------+
156		\| Edge \| paths \| O \|
157		+---------------+-------------------------+--------+
158		\| \| sets \| U \|
159		+---------------+-------------------------+--------+
160		\| O Group \| paths \| O \|
161		+---------------+-------------------------+--------+
162		\| \| sets \| U \|
163		+---------------+-------------------------+--------+
164		\| U Group \| sets \| U \|
165		+---------------+-------------------------+--------+
166
167		Segment backreference convenience methods
168		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
169
170		For segments, additional methods are available which combine in
171		different way the backreferences information. The
172		`dovetails_of_end` and `gaps_of_end` methods take an
173		argument ``L`` or ``R`` and return the dovetails overlaps (or gaps) of the
174		left or, respectively, right end of the segment sequence
175		(equivalent to the segment properties ``dovetails_L``/``dovetails_R`` and
176		``gaps_L``/``gaps_R``).
177
178		The segment ``containments`` property is a list of both containments where the
179		segment is the container or the contained segment. The segment ``edges``
180		property is a list of all edges (dovetails, containments and internals)
181		with a reference to the segment.
182
183		Other methods directly compute list of segments from the edges lists
184		mentioned above. The ``neighbours_L``, ``neighbours_R`` properties and
185		the `neighbours` method compute the set of segment instances which are
186		connected by dovetails to the segment.
187		The ``containers`` and ``contained``
188		properties similarly compute the set of segment instances which,
189		respectively, contains the segment, or are contained in the segment.
190
191		.. doctest::
192
193		>>> gfa = gfapy.Gfa()
194		>>> gfa.append('S\tA\t*')
195		>>> s = gfa.segment('A')
196		>>> gfa.append('S\tB\t*')
197		>>> gfa.append('S\tC\t*')
198		>>> gfa.append('L\tA\t-\tB\t+\t*')
199		>>> gfa.append('C\tA\t+\tC\t+\t10\t*')
200		>>> [str(l) for l in s.dovetails_of_end("L")]
201		['L\tA\t-\tB\t+\t*']
202		>>> s.dovetails_L == s.dovetails_of_end("L")
203		True
204		>>> s.gaps_of_end("R")
205		[]
206		>>> [str(e) for e in s.edges]
207		['L\tA\t-\tB\t+\t', 'C\tA\t+\tC\t+\t10\t']
208		>>> [str(n) for n in s.neighbours_L]
209		['S\tB\t*']
210		>>> s.containers
211		[]
212		>>> [str(c) for c in s.contained]
213		['S\tC\t*']
214
215		Multiline group definitions
216		~~~~~~~~~~~~~~~~~~~~~~~~~~~
217
218		The GFA2 specification opens the possibility (experimental) to define
219		groups on multiple lines, by using the same ID for each line defining
220		the group. This is supported by gfapy.
221
222		This means that if multiple `Ordered` or
223		`Unordered` instances connected to a Gfa object have
224		the same ``gid``, they are merged into a single instance (technically
225		the last one getting added to the graph object). The items list are
226		merged.
227
228		The tags of multiple line defining a group shall not contradict each
229		other (i.e. either are the tag names on different lines defining the
230		group all different, or, if the same tag is present on different lines,
231		the value and datatype must be the same, in which case the multiple
232		definition will be ignored).
233
234		.. doctest::
235
236		>>> gfa = gfapy.Gfa()
237		>>> gfa.add_line("U\tu1\ts1 s2 s3")
238		>>> [s.name for s in gfa.sets[-1].items]
239		['s1', 's2', 's3']
240		>>> gfa.add_line('U\tu1\t4 5')
241		>>> [s.name for s in gfa.sets[-1].items]
242		['s1', 's2', 's3', '4', '5']
243
244		Induced set and captured path
245		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
246
247		The item list in GFA2 sets and paths may not contain elements which are
248		implicitly involved. For example a path may contain segments, without
249		specifying the edges connecting them, if there is only one such edge.
250		Alternatively a path may contain edges, without explicitly indicating the
251		segments. Similarly a set may contain edges, but not the segments
252		referred to in them, or contain segments which are connected by edges,
253		without the edges themselves. Furthermore groups may refer to other
254		groups (set to sets or paths, paths to paths only), which then
255		indirectly contain references to segments and edges.
256
257		Gfapy provides methods for the computation of the sets of segments and
258		edges which are implied by an ordered or unordered group. Thereby all
259		references to subgroups are resolved and implicit elements are added, as
260		described in the specification. The computation can, therefore, only be
261		applied to connected lines. For unordered groups, this computation is
262		provided by the method ``induced_set()``, which returns an array of
263		segment and edge instances. For ordered group, the computation is
264		provided by the method ``captured_path()``, which returns a list of
265		``gfapy.OrientedLine`` instances, alternating segment and edge instances
266		(and starting and ending in segments).
267
268		The methods ``induced_segments_set()``, ``induced_edges_set()``,
269		``captured_segments()`` and ``captured_edges()`` return, respectively,
270		the list of only segments or edges, in ordered or unordered groups.
271
272		.. doctest::
273
274		>>> gfa = gfapy.Gfa()
275		>>> gfa.add_line("S\ts1\t100\t*")
276		>>> gfa.add_line("S\ts2\t100\t*")
277		>>> gfa.add_line("S\ts3\t100\t*")
278		>>> gfa.add_line("E\te1\ts1+\ts2-\t0\t10\t90\t100$\t*")
279		>>> gfa.add_line("U\tu1\ts1 s2 s3")
280		>>> u = gfa.sets[-1]
281		>>> [l.name for l in u.induced_edges_set]
282		['e1']
283		>>> [l.name for l in u.induced_segments_set ]
284		['s1', 's2', 's3']
285		>>> [l.name for l in u.induced_set ]
286		['s1', 's2', 's3', 'e1']
287
288		Disconnecting a line from a Gfa object
289		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
290
291		Lines can be disconnected using the ``rm(line)`` method of the
292		``gfapy.Gfa`` object or the ``disconnect()`` method of the line
293		instance.
294
295		.. doctest::
296
297		>>> gfa = gfapy.Gfa()
298		>>> gfa.append('S\tsA\t*')
299		>>> gfa.append('S\tsB\t*')
300		>>> line = gfa.segment("sA")
301		>>> gfa.segment_names
302		['sA', 'sB']
303		>>> gfa.rm(line)
304		>>> gfa.segment_names
305		['sB']
306		>>> line = gfa.segment('sB')
307		>>> line.disconnect()
308		>>> gfa.segment_names
309		[]
310
311		Disconnecting a line affects other lines as well. Lines which are
312		dependent on the disconnected line are disconnected as well. Any other
313		reference to disconnected lines is removed as well. In the disconnected
314		line, references to lines are transformed back to strings and
315		backreferences are deleted.
316
317		The following tables show which dependent lines are disconnected if they
318		refer to a line which is being disconnected.
319
320		GFA1
321		^^^^
322
323		+---------------+---------------------------------+
324		\| Record type \| Dependent lines \|
325		+===============+=================================+
326		\| Segment \| links (+ paths), containments \|
327		+---------------+---------------------------------+
328		\| Link \| paths \|
329		+---------------+---------------------------------+
330
331		GFA2
332		^^^^
333
334		+---------------+---------------------------------------+
335		\| Record type \| Dependent lines \|
336		+===============+=======================================+
337		\| Segment \| edges, gaps, fragments, sets, paths \|
338		+---------------+---------------------------------------+
339		\| Edge \| sets, paths \|
340		+---------------+---------------------------------------+
341		\| Sets \| sets, paths \|
342		+---------------+---------------------------------------+
343
344		Editing reference fields
345		~~~~~~~~~~~~~~~~~~~~~~~~
346
347		In connected line instances, it is not allowed to directly change the
348		content of fields containing references to other lines, as this would
349		make the state of the Gfa object invalid.
350
351		Besides the fields containing references, some other fields are
352		read-only in connected lines. Changing some of the fields would require
353		moving the backreferences to other collections (position fields of edges
354		and gaps, ``from_orient`` and ``to_orient`` of links). The overlaps
355		field of connected links is readonly as it may be necessary to identify
356		the link in paths.
357
358		Renaming an element
359		^^^^^^^^^^^^^^^^^^^
360
361		The name field of a line (e.g. segment ``name``/``sid``) is not a
362		reference and thus can be edited also in connected lines. When the name
363		of the line is changed, no manual editing of references (e.g.
364		``from_segment``/``to_segment``
365		fields in links) is necessary, as all lines which refer to the line will
366		still refer to the same instance. The references to the instance in the
367		Gfa lines collections will be automatically updated. Also, the new name
368		will be correctly used when converting to string, such as when the Gfa
369		instance is written to a GFA file.
370
371		Renaming a line to a name which already exists has the same effect of
372		adding a line with that name. That is, in most cases,
373		``gfapy.NotUniqueError`` is raised. An exception are GFA2 sets and
374		paths: in this case the line will be appended to the existing line with
375		the same name (as described in "Multiline group definitions").
376
377		Adding and removing group elements
378		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
379
380		Elements of GFA2 groups can be added and removed from both connected and
381		non-connected lines, using the following methods.
382
383		To add an item to or remove an item from an unordered group, use the
384		methods ``add_item(item)`` and ``rm_item(item)``, which take as argument
385		either a string (identifier) or a line instance.
386
387		To append or prepend an item to an ordered group, use the methods
388		``append_item(item)`` and ``prepend_item(item)``. To remove the first or
389		the last item of an ordered group use the methods ``rm_first_item()``
390		and ``rm_last_item()``.
391
392		Editing read-only fields of connected lines
393		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
394
395		Editing the read-only information of edges, gaps, links, containments,
396		fragments and paths is more complicated. These lines shall be
397		disconnected before the edit and connected again to the Gfa object after
398		it. Before disconnecting a line, you should check if there are other
399		lines dependent on it (see tables above). If so, you will have to
400		disconnect these lines first, eventually update their fields and
401		reconnect them at the end of the operation.
402
403		Virtual lines
404		~~~~~~~~~~~~~
405
406		The order of the lines in GFA is not prescribed. Therefore, during
407		parsing, or constructing a Gfa in memory, it is possible that a line is
408		referenced to, before it is added to the Gfa instance. Whenever this
409		happens, Gfapy creates a "virtual" line instance.
410
411		Users do not have to handle with virtual lines, if they work with
412		complete and valid GFA files.
413
414		Virtual lines are similar to normal line instances, with some
415		limitations (they contain only limited information and it is not allowed
416		to add tags to them). To check if a line is a virtual line, one can use
417		the ``virtual`` property of the line.
418
419		As soon as the parser founds the real line corresponding to a previously
420		introduced virtual line, the virtual line is exchanged with the real
421		line and all references are corrected to point to the real line.
422
423		.. doctest::
424
425		>>> g = gfapy.Gfa()
426		>>> g.add_line("S\t1\t*")
427		>>> g.add_line("L\t1\t+\t2\t+\t*")
428		>>> l = g.dovetails[0]
429		>>> g.segment("1").virtual
430		False
431		>>> g.segment("2").virtual
432		True
433		>>> l.to_segment == g.segment("2")
434		True
435		>>> g.segment("2").dovetails == [l]
436		True
437		>>> g.add_line("S\t2\t*")
438		>>> g.segment("2").virtual
439		False
440		>>> l.to_segment == g.segment("2")
441		True
442		>>> g.segment("2").dovetails == [l]
443		True

+0

-36

~~doc/tutorial/rgfa.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3
4		.. _rgfa:
5
6		rGFA
7		----
8
9		rGFA (https://github.com/lh3/gfatools/blob/master/doc/rGFA.md)
10		is a subset of GFA1, in which only particular line types (S and L)
11		are allowed, and the S lines are required to contain the tags
12		`SN` (of type `Z`), `SO` and `SR` (of type `i`).
13
14		When working with rGFA files, it is convenient to use the `dialect="rgfa"`
15		option in the constructor `Gfa()` and in
16		func:`Gfa.from_file() <gfapy.gfa.Gfa.from_file>`.
17
18		This ensures that additional validations are performed: GFA version must be 1,
19		only rGFA-compatible lines (S,L) are allowed and that the required tags are
20		required (with the correct datatype). The validations can also be executed
21		manually using `Gfa.validate_rgfa() <gfapy.gfa.Gfa.validate_rgfa>`.
22
23		Furthermore, the `stable_sequence_names` attribute of the GFA objects
24		returns the set of stable sequence names contained in the `SN` tags
25		of the segments.
26
27		.. doctest::
28
29		>>> g = gfapy.Gfa("S\tS1\tCTGAA\tSN:Z:chr1\tSO:i:0\tSR:i:0", dialect="rgfa")
30		>>> g.segment_names
31		['S1']
32		>>> g.stable_sequence_names
33		['chr1']
34		>>> g.add_line("S\tS2\tACG\tSN:Z:chr1\tSO:i:5\tSR:i:0")
35

+0

-436

~~doc/tutorial/tags.rst~~ less more

0		.. testsetup:: *
1
2		import gfapy
3		gfa = gfapy.Gfa()
4
5		.. _tags:
6
7		Tags
8		----
9
10		Each record in GFA can contain tags. Tags are fields which consist in a
11		tag name, a datatype and data. The format is ``NN:T:DATA`` where ``NN``
12		is a two-letter tag name, ``T`` is a one-letter datatype string and
13		``DATA`` is a string representing the data according to the specified
14		datatype. Tag names must be unique for each line, i.e. each line may
15		only contain a tag once.
16
17		::
18
19		# Examples of GFA tags of different datatypes:
20		"aa:i:-12"
21		"bb:f:1.23"
22		"cc:Z:this is a string"
23		"dd:A:X"
24		"ee:B:c,12,3,2"
25		"ff:H:122FA0"
26		'gg:J:["A","B"]'
27
28		Custom tags
29		~~~~~~~~~~~
30
31		Some tags are explicitly defined in the specification (these are named
32		predefined tags in Gfapy), and the user or an application can define
33		its own custom tags. These may contain lower case letters.
34
35		Custom tags are user or program specific and may of course collide with
36		the tags used by other users or programs. For this reasons, if you write
37		scripts which employ custom tags, you should always check that the
38		values are of the correct datatype and plausible.
39
40		.. doctest::
41
42		>>> line = gfapy.Line("H\txx:i:2")
43		>>> if line.get_datatype("xx") != "i":
44		... raise Exception("I expected the tag xx to contain an integer!")
45		>>> myvalue = line.xx
46		>>> if (myvalue > 120) or (myvalue % 2 == 1):
47		... raise Exception("The value in the xx tag is not an even value <= 120")
48		>>> # ... do something with myvalue
49
50		Also it is good practice to allow the user of the script to change the
51		name of the custom tags. For example, Gfapy employs the +or+ custom tag
52		to track the original segment from which a segment in the final graph is
53		derived. All methods which read or write the +or+ tag allow to specify
54		an alternative tag name to use instead of +or+, for the case that this
55		name collides with the custom tag of another program.
56
57		.. code:: python
58
59		# E.g. a method which does something with myvalue, usually stored in tag xx
60		# allows the user to specify an alternative name for the tag
61		def mymethod(line, mytag="xx"):
62		myvalue = line.get(mytag)
63		# ...
64
65		Predefined tags
66		~~~~~~~~~~~~~~~
67
68		According to the GFA specifications, predefined tag names consist of either
69		two upper case letters, or an upper case letter followed by a digit.
70		The GFA1 specification predefines tags for each line type, while GFA2
71		only predefines tags for the header and edges.
72
73		While tags with the predefined names are allowed to be added to any line,
74		when they are used in the lines mentiones in the specification (e.g. `VN`
75		in the header) gfapy checks that the datatype is the one prescribed by
76		the specification (e.g. `VN` must be of type `Z`). It is not forbidden
77		to use the same tags in other contexts, but in this case, the datatype
78		restriction is not enforced.
79
80		+------------+------------+-----------------------+
81		\| Tag \| Type \| Line types \| GFA version \|
82		+============+============+=======================+
83		\| VN \| Z \| H \| 1,2 \|
84		+-----+------+------------+-----------------------+
85		\| TS \| i \| H,S \| 2 \|
86		+-----+------+------------+-----------------------+
87		\| LN \| i \| S \| 1 \|
88		+-----+------+------------+-----------------------+
89		\| RC \| i \| S,L,C \| 1 \|
90		+-----+------+------------+-----------------------+
91		\| FC \| i \| S,L \| 1 \|
92		+-----+------+------------+-----------------------+
93		\| KC \| i \| S,L \| 1 \|
94		+-----+------+------------+-----------------------+
95		\| SH \| H \| S \| 1 \|
96		+-----+------+------------+-----------------------+
97		\| UR \| Z \| S \| 1 \|
98		+-----+------+------------+-----------------------+
99		\| MQ \| i \| L \| 1 \|
100		+-----+------+------------+-----------------------+
101		\| NM \| i \| L,i \| 1 \|
102		+-----+------+------------+-----------------------+
103		\| ID \| Z \| L,C \| 1 \|
104		+-----+------+------------+-----------------------+
105
106		::
107
108		"VN:Z:1.0" # VN => predefined tag
109		"z5:Z:1.0" # z5 first char is downcase => custom tag
110		"XX:Z:aaa" # XX upper case, but not predefined => custom tag
111
112		# not forbidden, but not recommended:
113		"zZ:Z:1.0" # => mixed case, first char downcase => custom tag
114		"Zz:Z:1.0" # => mixed case, first char upcase => custom tag
115		"vn:Z:1.0" # => same name as predefined tag, but downcase => custom tag
116
117		Datatypes
118		~~~~~~~~~
119
120		The following table summarizes the datatypes available for tags:
121
122		+----------+-----------------+---------------------------+----------------------+
123		\| Symbol \| Datatype \| Example \| Python class \|
124		+==========+=================+===========================+======================+
125		\| Z \| string \| This is a string \| str \|
126		+----------+-----------------+---------------------------+----------------------+
127		\| i \| integer \| -12 \| int \|
128		+----------+-----------------+---------------------------+----------------------+
129		\| f \| float \| 1.2E-5 \| float \|
130		+----------+-----------------+---------------------------+----------------------+
131		\| A \| char \| X \| str \|
132		+----------+-----------------+---------------------------+----------------------+
133		\| J \| JSON \| [1,{"k1":1,"k2":2},"a"] \| list/dict \|
134		+----------+-----------------+---------------------------+----------------------+
135		\| B \| numeric array \| f,1.2,13E-2,0 \| gfapy.NumericArray \|
136		+----------+-----------------+---------------------------+----------------------+
137		\| H \| byte array \| FFAA01 \| gfapy.ByteArray \|
138		+----------+-----------------+---------------------------+----------------------+
139
140		Validation
141		~~~~~~~~~~
142
143		The tag names must consist of a letter and a digit or two letters.
144
145		::
146
147		"KC:i:1" # => OK
148		"xx:i:1" # => OK
149		"x1:i:1" # => OK
150		"xxx:i:1" # => error: name is too long
151		"x:i:1" # => error: name is too short
152		"11:i:1" # => error: at least one letter must be present
153
154		The datatype must be one of the datatypes specified above. For
155		predefined tags, Gfapy also checks that the datatype given in the
156		specification is used.
157
158		::
159
160		"xx:X:1" # => error: datatype X is unknown
161		"VN:i:1" # => error: VN must be of type Z
162
163		The data must be a correctly formatted string for the specified datatype
164		or a Python object whose string representation is a correctly formatted
165		string.
166
167		.. doctest::
168
169		# current value: xx:i:2
170		>>> line = gfapy.Line("S\tA\t*\txx:i:2")
171		>>> line.xx = 1
172		>>> line.xx
173		1
174		>>> line.xx = "3"
175		>>> line.xx
176		3
177		>>> line.xx = "A"
178		>>> line.xx
179		Traceback (most recent call last):
180		...
181		gfapy.error.FormatError: ...
182
183		Depending on the validation level, more or less checks are done
184		automatically (see :ref:`validation` chapter). Per default - validation level
185		(1) - validation is performed only during parsing or accessing values
186		the first time, therefore the user must perform a manual validation if
187		he changes values to something which is not guaranteed to be correct. To
188		trigger a manual validation, the user can call the method
189		``validate_field(fieldname)`` to validate a single tag, or
190		``validate()`` to validate the whole line, including all tags.
191
192		.. doctest::
193
194		>>> line = gfapy.Line("S\tA\t*\txx:i:2", vlevel = 0)
195		>>> line.validate_field("xx")
196		>>> line.validate()
197		>>> line.xx = "A"
198		>>> line.validate_field("xx")
199		Traceback (most recent call last):
200		...
201		gfapy.error.FormatError: ...
202		>>> line.validate()
203		Traceback (most recent call last):
204		...
205		gfapy.error.FormatError: ...
206		>>> line.xx = "3"
207		>>> line.validate_field("xx")
208		>>> line.validate()
209
210		Reading and writing tags
211		~~~~~~~~~~~~~~~~~~~~~~~~
212
213		Tags can be read using a property on the Gfapy line object, which is
214		called as the tag (e.g. line.xx). A special version of the property
215		prefixed by ``try_get_`` raises an error if the tag was not available
216		(e.g. ``line.try_get_LN``), while the tag property (e.g. ``line.LN``)
217		would return ``None`` in this case. Setting the value is done assigning
218		a value to it the tag name method (e.g. ``line.TS = 120``). In
219		alternative, the ``set(fieldname, value)``, ``get(fieldname)`` and
220		``try_get(fieldname)`` methods can also be used. To remove a tag from a
221		line, use the ``delete(fieldname)`` method, or set its value to
222		``None``. The ``tagnames`` property Line instances is a list of
223		the names (as strings) of all defined tags for a line.
224
225
226		.. doctest::
227
228		>>> line = gfapy.Line("S\tA\t*\txx:i:1", vlevel = 0)
229		>>> line.xx
230		1
231		>>> line.xy is None
232		True
233		>>> line.try_get_xx()
234		1
235		>>> line.try_get_xy()
236		Traceback (most recent call last):
237		...
238		gfapy.error.NotFoundError: ...
239		>>> line.get("xx")
240		1
241		>>> line.try_get("xy")
242		Traceback (most recent call last):
243		...
244		gfapy.error.NotFoundError: ...
245		>>> line.xx = 2
246		>>> line.xx
247		2
248		>>> line.xx = "a"
249		>>> line.tagnames
250		['xx']
251		>>> line.xy = 2
252		>>> line.xy
253		2
254		>>> line.set("xy", 3)
255		>>> line.get("xy")
256		3
257		>>> line.tagnames
258		['xx', 'xy']
259		>>> line.delete("xy")
260		3
261		>>> line.xy is None
262		True
263		>>> line.xx = None
264		>>> line.xx is None
265		True
266		>>> line.try_get("xx")
267		Traceback (most recent call last):
268		...
269		gfapy.error.NotFoundError: ...
270		>>> line.tagnames
271		[]
272
273		When a tag is read, the value is converted into an appropriate object
274		(see Python classes in the datatype table above). When setting a value,
275		the user can specify the value of a tag either as a Python object, or as
276		the string representation of the value.
277
278		.. doctest::
279
280		>>> line = gfapy.Line('H\txx:i:1\txy:Z:TEXT\txz:J:["a","b"]')
281		>>> line.xx
282		1
283		>>> isinstance(line.xx, int)
284		True
285		>>> line.xy
286		'TEXT'
287		>>> isinstance(line.xy, str)
288		True
289		>>> line.xz
290		['a', 'b']
291		>>> isinstance(line.xz, list)
292		True
293
294		The string representation of a tag can be read using the
295		``field_to_s(fieldname)`` method. The default is to only output the
296		content of the field. By setting \`\`tag: true\`\`\`, the entire tag is
297		output (name, datatype, content, separated by colons). An exception is
298		raised if the field does not exist.
299
300		.. doctest::
301
302		>>> line = gfapy.Line("H\txx:i:1")
303		>>> line.xx
304		1
305		>>> line.field_to_s("xx")
306		'1'
307		>>> line.field_to_s("xx", tag=True)
308		'xx:i:1'
309
310		Datatype of custom tags
311		~~~~~~~~~~~~~~~~~~~~~~~
312
313		The datatype of an existing custom field (but not of predefined fields)
314		can be changed using the ``set_datatype(fieldname, datatype)`` method.
315		The current datatype specification can be read using
316		``get_datatype(fieldname)``.
317
318		.. doctest::
319
320		>>> line = gfapy.Line("H\txx:i:1")
321		>>> line.get_datatype("xx")
322		'i'
323		>>> line.set_datatype("xx", "Z")
324		>>> line.get_datatype("xx")
325		'Z'
326
327		If a new custom tag is specified, Gfapy selects the correct datatype for
328		it: i/f for numeric values, J/B for arrays, J for hashes and Z for
329		strings and strings. If the user wants to specify a different datatype,
330		he may do so by setting it with ``set_datatype()`` (this can be done
331		also before assigning a value, which is necessary if full validation is
332		active).
333
334		.. doctest::
335
336		>>> line = gfapy.Line("H")
337		>>> line.xx = "1"
338		>>> line.xx
339		'1'
340		>>> line.set_datatype("xy", "i")
341		>>> line.xy = "1"
342		>>> line.xy
343		1
344
345		Arrays of numerical values
346		~~~~~~~~~~~~~~~~~~~~~~~~~~
347
348		``B`` and ``H`` tags represent array with particular constraints (e.g.
349		they can only contain numeric values, and in some cases the values must
350		be in predefined ranges). In order to represent them correctly and allow
351		for validation, Python classes have been defined for both kind of tags:
352		``gfapy.ByteArray`` for ``H`` and ``gfapy.NumericArray`` for ``B``
353		fields.
354
355		Both are subclasses of list. Object of the two classes can be created by
356		passing an existing list or the string representation to the class
357		constructor.
358
359		.. doctest::
360
361		>>> # create a byte array instance
362		>>> gfapy.ByteArray([12,3,14])
363		b'\x0c\x03\x0e'
364		>>> gfapy.ByteArray("A012FF")
365		b'\xa0\x12\xff'
366		>>> # create a numeric array instance
367		>>> gfapy.NumericArray.from_string("c,12,3,14")
368		[12, 3, 14]
369		>>> gfapy.NumericArray([12,3,14])
370		[12, 3, 14]
371
372		Instances of the classes behave as normal lists, except that they
373		provide a #validate() method, which checks the constraints, and that
374		their string representation is the GFA string representation of the
375		field value.
376
377		.. doctest::
378
379		>>> gfapy.NumericArray([12,1,"1x"]).validate()
380		Traceback (most recent call last):
381		...
382		gfapy.error.ValueError
383		>>> str(gfapy.NumericArray([12,3,14]))
384		'C,12,3,14'
385		>>> gfapy.ByteArray([12,1,"1x"]).validate()
386		Traceback (most recent call last):
387		...
388		gfapy.error.ValueError
389		>>> str(gfapy.ByteArray([12,3,14]))
390		'0C030E'
391
392		For numeric values, the `compute_subtype` method allows to compute
393		the subtype which will be used for the string representation. Unsigned
394		subtypes are used if all values are positive. The smallest possible
395		subtype range is selected. The subtype may change when the range of the
396		elements changes.
397
398		.. doctest::
399
400		>>> gfapy.NumericArray([12,13,14]).compute_subtype()
401		'C'
402
403		Special cases: custom records, headers, comments and virtual lines.
404		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
405
406		GFA2 allows custom records, introduced by record type strings other than
407		the predefined ones. Gfapy uses a pragmatical approach for identifying
408		tags in custom records, and tries to interpret the rightmost fields as
409		tags, until the first field from the right raises an error; all
410		remaining fields are treated as positional fields.
411
412		::
413
414		"X a b c xx:i:12" # => xx is tag, a, b, c are positional fields
415		"Y a b xx:i:12 c" # => all positional fields, as c is not a valid tag
416
417		For easier access, the entire header of the GFA is summarized in a
418		single line instance. A class (`FieldArray`) has been defined to
419		handle the special case when multiple H lines define the same tag (see
420		:ref:`header` chapter for details).
421
422		Comment lines are represented by a subclass of the same class
423		(`Line`) as the records. However, they cannot contain tags: the
424		entire line is taken as content of the comment. See the :ref:`comments`
425		chapter for more information about comments.
426
427		::
428
429		"# this is not a tag: xx:i:1" # => xx is not a tag, xx:i:1 is part of the comment
430
431		Virtual instances of the `Line` class (e.g. segment instances automatically
432		created because of not yet resolved references found in edges) cannot be
433		modified by the user, and tags cannot be specified for them. This
434		includes all instances of the `Unknown` class. See the
435		:ref:`references` chapter for more information about virtual lines.

+0

-78

~~doc/tutorial/validation.rst~~ less more

0		.. _validation:
1
2		Validation
3		----------
4
5		Different validation levels are available. They represent different
6		compromises between speed and warrant of validity. The validation level
7		can be specified when the :class:`~gfapy.gfa.Gfa` object is created, using the
8		``vlevel`` parameter of the constructor and of the
9		`from_file` method. Four levels of validation are defined
10		(0 = no validation, 1 = validation by reading, 2 = validation by reading
11		and writing, 3 = continuous validation). The default validation level
12		value is 1.
13
14		Manual validation
15		~~~~~~~~~~~~~~~~~
16
17		Independently from the validation level chosen, the user can always check the
18		value of a field calling
19		:meth:`~gfapy.line.common.validate.Validate.validate_field` on the line
20		instance. If no exception is raised, the field content is valid.
21
22		To check if the entire content of the line is valid, the user can call
23		:meth:`~gfapy.line.common.validate.Validate.validate` on the line instance.
24		This will check all fields and perform cross-field validations, such as
25		comparing the length of the sequence of a GFA1 segment, to the value of the LN
26		tag (if present).
27
28		It is also possible to validate the structure of the GFA, for example to
29		check if there are unresolved references to lines. To do this, use the
30		:meth:`~gfapy.gfa.Gfa.validate` of the :class:`~gfapy.gfa.Gfa` instance.
31
32		No validations
33		~~~~~~~~~~~~~~
34
35		If the validation is set to 0, Gfapy will try to accept any input and
36		never raise an exception. This is not always possible, and in some
37		cases, an exception will still be raised, if the data is invalid.
38
39		Validation when reading
40		~~~~~~~~~~~~~~~~~~~~~~~
41
42		If the validation level is set to 1 or higher, basic validations will be
43		performed, such as checking the number of positional fields, the
44		presence of duplicated tags, the tag datatype of predefined tags.
45		Additionally, all tags will be validated, either during parsing or on
46		first access. Record-type cross-field validations will also be
47		performed.
48
49		In other words, a validation of 1 means that Gfapy guarantees (as good
50		as it can) that the GFA content read from a file is valid, and will
51		raise an exception on accessing the data if not.
52
53		The user is supposed to call `validate_field` after changing
54		a field content to something which can be potentially invalid, or
55		:meth:`~gfapy.line.common.validate.Validate.validate` if potentially
56		cross-field validations could fail.
57
58		Validation when writing
59		~~~~~~~~~~~~~~~~~~~~~~~
60
61		Setting the level to 2 will perform all validations described above,
62		plus validate the fields content when their value is written to string.
63
64		In other words, a validation of 2 means that Gfapy guarantee (as good as
65		it can) that the GFA content read from a file and written to a file is
66		valid and will raise an exception on accessing the data or writing to
67		file if not.
68
69		Continuous validation
70		~~~~~~~~~~~~~~~~~~~~~
71
72		If the validation level is set to 3, all validations for lower levels
73		described above are run, plus a validation of fields contents each time
74		a setter method is used.
75
76		A validation of 3 means that Gfapy guarantees (as good as it can) that
77		the GFA content is always valid.

+1

-0

gfapy/graph_operations/linear_paths.py less more

325	325	merged.name = "_".join(merged.name)
326	326	ortag = merged.get("or")
327	327	if isinstance(ortag, list):
	328	merged.set_datatype("or", "Z")
328	329	merged.set("or", ",".join(ortag))
329	330	if not gfapy.is_placeholder(merged.sequence):
330	331	merged.sequence = "".join(merged.sequence)

+2

-2

gfapy/line/edge/gfa2/alignment_type.py less more

36	36	if gfapy.posvalue(begpos) > gfapy.posvalue(endpos):
37	37	raise gfapy.ValueError(
38	38	"Line: {}\n".format(str(self))+
39		"begin > end: {}$ > {}".format(gfapy.posvalue(begpos),
40		gfapy.posvalue(endpos)))
	39	"begin > end: {} > {}".format(gfapy.posvalue(begpos),
	40	gfapy.posvalue(endpos)))
41	41	if gfapy.isfirstpos(begpos):
42	42	if gfapy.isfirstpos(endpos):
43	43	return ("pfx", True)

+151

-0

gfapy.egg-info/PKG-INFO less more

	0	Metadata-Version: 2.1
	1	Name: gfapy
	2	Version: 1.2.3
	3	Summary: Library for handling data in the GFA1 and GFA2 formats
	4	Home-page: https://github.com/ggonnella/gfapy
	5	Author: Giorgio Gonnella and others (see CONTRIBUTORS)
	6	Author-email: gonnella@zbh.uni-hamburg.de
	7	License: ISC
	8	Keywords: bioinformatics genomics sequences GFA assembly graphs
	9	Classifier: Development Status :: 5 - Production/Stable
	10	Classifier: Environment :: Console
	11	Classifier: Intended Audience :: Developers
	12	Classifier: Intended Audience :: End Users/Desktop
	13	Classifier: Intended Audience :: Science/Research
	14	Classifier: License :: OSI Approved :: ISC License (ISCL)
	15	Classifier: Operating System :: MacOS :: MacOS X
	16	Classifier: Operating System :: POSIX :: Linux
	17	Classifier: Programming Language :: Python :: 3 :: Only
	18	Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
	19	Classifier: Topic :: Software Development :: Libraries
	20	License-File: LICENSE.txt
	21
	22	Gfapy
	23	~~~~~
	24
	25	\|travis\| \|readthedocs\| \|latesttag\| \|license\|
	26
	27	\|bioconda\| \|pypi\| \|debian\| \|ubuntu\|
	28
	29	.. sphinx-begin
	30
	31	The Graphical Fragment Assembly (GFA) are formats for the representation
	32	of sequence graphs, including assembly, variation and splicing graphs.
	33	Two versions of GFA have been defined (GFA1 and GFA2) and several sequence
	34	analysis programs have been adopting the formats as an interchange format,
	35	which allow to easily combine different sequence analysis tools.
	36
	37	This library implements the GFA1 and GFA2 specification
	38	described at https://github.com/GFA-spec/GFA-spec/blob/master/GFA-spec.md.
	39	It allows to create a Gfa object from a file in the GFA format
	40	or from scratch, to enumerate the graph elements (segments, links,
	41	containments, paths and header lines), to traverse the graph (by
	42	traversing all links outgoing from or incoming to a segment), to search for
	43	elements (e.g. which links connect two segments) and to manipulate the
	44	graph (e.g. to eliminate a link or a segment or to duplicate a segment
	45	distributing the read counts evenly on the copies).
	46
	47	The GFA format can be easily extended by users by defining own custom
	48	tags and record types. In Gfapy, it is easy to write extensions modules,
	49	which allow to define custom record types and datatypes for the parsing
	50	and validation of custom fields. The custom lines can be connected, using
	51	references, to each other and to lines of the standard record types.
	52
	53	Requirements
	54	~~~~~~~~~~~~
	55
	56	Gfapy has been written for Python 3 and tested using Python version 3.7.
	57	It does not require any additional Python packages or other software.
	58
	59	Installation
	60	~~~~~~~~~~~~
	61
	62	Gfapy is distributed as a Python package and can be installed using
	63	the Python package manager pip, as well as conda (in the Bioconda channel).
	64	It is also available as a package in some Linux distributions (Debian, Ubuntu).
	65
	66	The following command installs the current stable version from the Python
	67	Packages index::
	68
	69	pip install gfapy
	70
	71	If you would like to install the current development version from Github,
	72	use the following command::
	73
	74	pip install -e git+https://github.com/ggonnella/gfapy.git#egg=gfapy
	75
	76	Alternatively it is possible to install gfapy using conda. Gfapy is
	77	included in the Bioconda (https://bioconda.github.io/) channel::
	78
	79	conda install -c bioconda gfapy
	80
	81	Usage
	82	~~~~~
	83
	84	If you installed gfapy as described above, you can import it in your script
	85	using the conventional Python syntax::
	86
	87	>>> import gfapy
	88
	89	Documentation
	90	~~~~~~~~~~~~~
	91
	92	The documentation, including this introduction to Gfapy, a user manual
	93	and the API documentation is hosted on the ReadTheDocs server,
	94	at the URL http://gfapy.readthedocs.io/en/latest/ and it can be
	95	downloaded as PDF from the URL
	96	https://github.com/ggonnella/gfapy/blob/master/manual/gfapy-manual.pdf.
	97
	98	References
	99	~~~~~~~~~~
	100
	101	Giorgio Gonnella and Stefan Kurtz "GfaPy: a flexible and extensible software
	102	library for handling sequence graphs in Python", Bioinformatics (2017) btx398
	103	https://doi.org/10.1093/bioinformatics/btx398
	104
	105	.. sphinx-end
	106
	107	.. \|travis\|
	108	image:: https://travis-ci.com/ggonnella/gfapy.svg?branch=master
	109	:target: https://travis-ci.com/ggonnella/gfapy
	110	:alt: Travis
	111
	112	.. \|latesttag\|
	113	image:: https://img.shields.io/github/v/tag/ggonnella/gfapy
	114	:target: https://github.com/ggonnella/gfapy/tags
	115	:alt: Latest GitHub tag
	116
	117	.. \|readthedocs\|
	118	image:: https://readthedocs.org/projects/pip/badge/?version=stable
	119	:target: https://pip.pypa.io/en/stable/?badge=stable
	120	:alt: ReadTheDocs
	121
	122	.. \|bioconda\|
	123	image:: https://img.shields.io/conda/vn/bioconda/gfapy
	124	:target: https://bioconda.github.io/recipes/gfapy/README.html
	125	:alt: Bioconda
	126
	127	.. \|pypi\|
	128	image:: https://img.shields.io/pypi/v/gfapy
	129	:target: https://pypi.org/project/gfapy/
	130	:alt: PyPI
	131
	132	.. \|debian\|
	133	image:: https://img.shields.io/debian/v/gfapy
	134	:target: https://packages.debian.org/search?keywords=gfapy
	135	:alt: Debian
	136
	137	.. \|ubuntu\|
	138	image:: https://img.shields.io/ubuntu/v/gfapy
	139	:target: https://packages.ubuntu.com/search?keywords=gfapy
	140	:alt: Ubuntu
	141
	142	.. \|license\|
	143	image:: https://img.shields.io/pypi/l/gfapy
	144	:target: https://github.com/ggonnella/gfapy/blob/master/LICENSE.txt
	145	:alt: ISC License
	146
	147	.. \|requiresio\|
	148	image:: https://requires.io/github/ggonnella/gfapy/requirements.svg?branch=master
	149	:target: https://requires.io/github/ggonnella/gfapy/requirements/?branch=master
	150	:alt: Requirements Status

+338

-0

gfapy.egg-info/SOURCES.txt less more

	0	LICENSE.txt
	1	MANIFEST.in
	2	README.rst
	3	setup.cfg
	4	setup.py
	5	bin/gfapy-convert
	6	bin/gfapy-mergelinear
	7	bin/gfapy-renumber
	8	bin/gfapy-validate
	9	gfapy/__init__.py
	10	gfapy/byte_array.py
	11	gfapy/error.py
	12	gfapy/field_array.py
	13	gfapy/gfa.py
	14	gfapy/lastpos.py
	15	gfapy/logger.py
	16	gfapy/numeric_array.py
	17	gfapy/oriented_line.py
	18	gfapy/placeholder.py
	19	gfapy/rgfa.py
	20	gfapy/segment_end.py
	21	gfapy/segment_end_path.py
	22	gfapy/sequence.py
	23	gfapy/symbol_invert.py
	24	gfapy.egg-info/PKG-INFO
	25	gfapy.egg-info/SOURCES.txt
	26	gfapy.egg-info/dependency_links.txt
	27	gfapy.egg-info/not-zip-safe
	28	gfapy.egg-info/top_level.txt
	29	gfapy/alignment/__init__.py
	30	gfapy/alignment/alignment.py
	31	gfapy/alignment/cigar.py
	32	gfapy/alignment/placeholder.py
	33	gfapy/alignment/trace.py
	34	gfapy/field/__init__.py
	35	gfapy/field/alignment_gfa1.py
	36	gfapy/field/alignment_gfa2.py
	37	gfapy/field/alignment_list_gfa1.py
	38	gfapy/field/byte_array.py
	39	gfapy/field/char.py
	40	gfapy/field/comment.py
	41	gfapy/field/custom_record_type.py
	42	gfapy/field/field.py
	43	gfapy/field/float.py
	44	gfapy/field/generic.py
	45	gfapy/field/identifier_gfa2.py
	46	gfapy/field/identifier_list_gfa2.py
	47	gfapy/field/integer.py
	48	gfapy/field/json.py
	49	gfapy/field/numeric_array.py
	50	gfapy/field/optional_identifier_gfa2.py
	51	gfapy/field/optional_integer.py
	52	gfapy/field/orientation.py
	53	gfapy/field/oriented_identifier_gfa2.py
	54	gfapy/field/oriented_identifier_list_gfa1.py
	55	gfapy/field/oriented_identifier_list_gfa2.py
	56	gfapy/field/parser.py
	57	gfapy/field/path_name_gfa1.py
	58	gfapy/field/position_gfa1.py
	59	gfapy/field/position_gfa2.py
	60	gfapy/field/segment_name_gfa1.py
	61	gfapy/field/sequence_gfa1.py
	62	gfapy/field/sequence_gfa2.py
	63	gfapy/field/string.py
	64	gfapy/field/validator.py
	65	gfapy/field/writer.py
	66	gfapy/graph_operations/__init__.py
	67	gfapy/graph_operations/artifacts.py
	68	gfapy/graph_operations/copy_number.py
	69	gfapy/graph_operations/graph_operations.py
	70	gfapy/graph_operations/invertible_segments.py
	71	gfapy/graph_operations/linear_paths.py
	72	gfapy/graph_operations/multiplication.py
	73	gfapy/graph_operations/p_bubbles.py
	74	gfapy/graph_operations/redundant_linear_paths.py
	75	gfapy/graph_operations/superfluous_links.py
	76	gfapy/graph_operations/topology.py
	77	gfapy/line/__init__.py
	78	gfapy/line/line.py
	79	gfapy/line/comment/__init__.py
	80	gfapy/line/comment/comment.py
	81	gfapy/line/comment/construction.py
	82	gfapy/line/comment/tags.py
	83	gfapy/line/comment/version_conversion.py
	84	gfapy/line/comment/writer.py
	85	gfapy/line/common/__init__.py
	86	gfapy/line/common/cloning.py
	87	gfapy/line/common/connection.py
	88	gfapy/line/common/construction.py
	89	gfapy/line/common/default_record_definition.py
	90	gfapy/line/common/disconnection.py
	91	gfapy/line/common/dynamic_fields.py
	92	gfapy/line/common/equivalence.py
	93	gfapy/line/common/field_data.py
	94	gfapy/line/common/field_datatype.py
	95	gfapy/line/common/update_references.py
	96	gfapy/line/common/validate.py
	97	gfapy/line/common/version_conversion.py
	98	gfapy/line/common/virtual_to_real.py
	99	gfapy/line/common/writer.py
	100	gfapy/line/custom_record/__init__.py
	101	gfapy/line/custom_record/construction.py
	102	gfapy/line/custom_record/custom_record.py
	103	gfapy/line/edge/__init__.py
	104	gfapy/line/edge/edge.py
	105	gfapy/line/edge/common/__init__.py
	106	gfapy/line/edge/common/alignment_type.py
	107	gfapy/line/edge/common/from_to.py
	108	gfapy/line/edge/containment/__init__.py
	109	gfapy/line/edge/containment/canonical.py
	110	gfapy/line/edge/containment/containment.py
	111	gfapy/line/edge/containment/pos.py
	112	gfapy/line/edge/containment/to_gfa2.py
	113	gfapy/line/edge/gfa1/__init__.py
	114	gfapy/line/edge/gfa1/alignment_type.py
	115	gfapy/line/edge/gfa1/oriented_segments.py
	116	gfapy/line/edge/gfa1/other.py
	117	gfapy/line/edge/gfa1/references.py
	118	gfapy/line/edge/gfa1/to_gfa2.py
	119	gfapy/line/edge/gfa2/__init__.py
	120	gfapy/line/edge/gfa2/alignment_type.py
	121	gfapy/line/edge/gfa2/gfa2.py
	122	gfapy/line/edge/gfa2/other.py
	123	gfapy/line/edge/gfa2/references.py
	124	gfapy/line/edge/gfa2/to_gfa1.py
	125	gfapy/line/edge/gfa2/validation.py
	126	gfapy/line/edge/link/__init__.py
	127	gfapy/line/edge/link/canonical.py
	128	gfapy/line/edge/link/complement.py
	129	gfapy/line/edge/link/equivalence.py
	130	gfapy/line/edge/link/link.py
	131	gfapy/line/edge/link/references.py
	132	gfapy/line/edge/link/to_gfa2.py
	133	gfapy/line/fragment/__init__.py
	134	gfapy/line/fragment/fragment.py
	135	gfapy/line/fragment/references.py
	136	gfapy/line/fragment/validation.py
	137	gfapy/line/gap/__init__.py
	138	gfapy/line/gap/gap.py
	139	gfapy/line/gap/references.py
	140	gfapy/line/group/__init__.py
	141	gfapy/line/group/group.py
	142	gfapy/line/group/gfa2/__init__.py
	143	gfapy/line/group/gfa2/references.py
	144	gfapy/line/group/gfa2/same_id.py
	145	gfapy/line/group/ordered/__init__.py
	146	gfapy/line/group/ordered/captured_path.py
	147	gfapy/line/group/ordered/ordered.py
	148	gfapy/line/group/ordered/references.py
	149	gfapy/line/group/ordered/to_gfa1.py
	150	gfapy/line/group/path/__init__.py
	151	gfapy/line/group/path/captured_path.py
	152	gfapy/line/group/path/path.py
	153	gfapy/line/group/path/references.py
	154	gfapy/line/group/path/to_gfa2.py
	155	gfapy/line/group/path/topology.py
	156	gfapy/line/group/path/validation.py
	157	gfapy/line/group/unordered/__init__.py
	158	gfapy/line/group/unordered/induced_set.py
	159	gfapy/line/group/unordered/references.py
	160	gfapy/line/group/unordered/unordered.py
	161	gfapy/line/header/__init__.py
	162	gfapy/line/header/connection.py
	163	gfapy/line/header/field_data.py
	164	gfapy/line/header/header.py
	165	gfapy/line/header/multiline.py
	166	gfapy/line/header/version_conversion.py
	167	gfapy/line/segment/__init__.py
	168	gfapy/line/segment/coverage.py
	169	gfapy/line/segment/gfa1.py
	170	gfapy/line/segment/gfa1_to_gfa2.py
	171	gfapy/line/segment/gfa2.py
	172	gfapy/line/segment/gfa2_to_gfa1.py
	173	gfapy/line/segment/length_gfa1.py
	174	gfapy/line/segment/references.py
	175	gfapy/line/segment/segment.py
	176	gfapy/line/segment/writer_wo_sequence.py
	177	gfapy/line/unknown/__init__.py
	178	gfapy/line/unknown/unknown.py
	179	gfapy/lines/__init__.py
	180	gfapy/lines/collections.py
	181	gfapy/lines/creators.py
	182	gfapy/lines/destructors.py
	183	gfapy/lines/finders.py
	184	gfapy/lines/headers.py
	185	gfapy/lines/lines.py
	186	manual/gfapy-manual.pdf
	187	tests/__init__.py
	188	tests/extension.py
	189	tests/test_api_alignment.py
	190	tests/test_api_comments.py
	191	tests/test_api_custom_records.py
	192	tests/test_api_extensions.py
	193	tests/test_api_gfa1_lines.py
	194	tests/test_api_gfa2_lines.py
	195	tests/test_api_gfa_basics.py
	196	tests/test_api_groups_validation.py
	197	tests/test_api_header.py
	198	tests/test_api_linear_paths.py
	199	tests/test_api_linear_paths_extended.py
	200	tests/test_api_lines_collections.py
	201	tests/test_api_lines_creators.py
	202	tests/test_api_lines_destructors.py
	203	tests/test_api_lines_finders.py
	204	tests/test_api_multiplication.py
	205	tests/test_api_placeholders.py
	206	tests/test_api_positionals.py
	207	tests/test_api_positions.py
	208	tests/test_api_references_edge_gfa1.py
	209	tests/test_api_references_edge_gfa2.py
	210	tests/test_api_references_f_g_lines.py
	211	tests/test_api_references_groups.py
	212	tests/test_api_references_virtual.py
	213	tests/test_api_rename_lines.py
	214	tests/test_api_rgfa.py
	215	tests/test_api_tags.py
	216	tests/test_api_version.py
	217	tests/test_api_version_conversion.py
	218	tests/test_gfapy_alignment.py
	219	tests/test_gfapy_byte_array.py
	220	tests/test_gfapy_cigar.py
	221	tests/test_gfapy_line_containment.py
	222	tests/test_gfapy_line_edge.py
	223	tests/test_gfapy_line_header.py
	224	tests/test_gfapy_line_link.py
	225	tests/test_gfapy_line_path.py
	226	tests/test_gfapy_line_segment.py
	227	tests/test_gfapy_line_version.py
	228	tests/test_gfapy_numeric_array.py
	229	tests/test_gfapy_segment_references.py
	230	tests/test_gfapy_sequence.py
	231	tests/test_gfapy_trace.py
	232	tests/test_graphop_artifacts.py
	233	tests/test_graphop_copy_number.py
	234	tests/test_internals_field_parser.py
	235	tests/test_internals_field_validator.py
	236	tests/test_internals_field_writer.py
	237	tests/test_internals_tag_datatype.py
	238	tests/test_unit_alignment.py
	239	tests/test_unit_field_array.py
	240	tests/test_unit_gfa_lines.py
	241	tests/test_unit_header.py
	242	tests/test_unit_line.py
	243	tests/test_unit_line_cloning.py
	244	tests/test_unit_line_connection.py
	245	tests/test_unit_line_dynamic_fields.py
	246	tests/test_unit_line_equivalence.py
	247	tests/test_unit_lines_finders.py
	248	tests/test_unit_multiplication.py
	249	tests/test_unit_numeric_array.py
	250	tests/test_unit_oriented_line.py
	251	tests/test_unit_segment_end.py
	252	tests/test_unit_symbol_invert.py
	253	tests/test_unit_unknown.py
	254	tests/testdata/all_line_types.gfa1.gfa
	255	tests/testdata/all_line_types.gfa2.gfa
	256	tests/testdata/copynum.1.gfa
	257	tests/testdata/copynum.1.gfa2
	258	tests/testdata/copynum.2.gfa
	259	tests/testdata/copynum.2.gfa2
	260	tests/testdata/dead_ends.gfa
	261	tests/testdata/dead_ends.gfa2
	262	tests/testdata/example1.gfa
	263	tests/testdata/example1.gfa2
	264	tests/testdata/example_from_spec.gfa
	265	tests/testdata/example_from_spec.gfa2
	266	tests/testdata/example_from_spec.path14.seq
	267	tests/testdata/example_from_spec2.gfa
	268	tests/testdata/example_from_spec2.gfa2
	269	tests/testdata/filled.gfa1
	270	tests/testdata/filled.gfa2
	271	tests/testdata/gfa2_edges_classification.gfa
	272	tests/testdata/invalid_path.gfa2
	273	tests/testdata/linear_blunt.gfa1
	274	tests/testdata/linear_blunt.gfa2
	275	tests/testdata/linear_merging.1.gfa
	276	tests/testdata/linear_merging.1.gfa2
	277	tests/testdata/linear_merging.2.gfa
	278	tests/testdata/linear_merging.2.gfa2
	279	tests/testdata/linear_merging.3.gfa
	280	tests/testdata/linear_merging.3.gfa2
	281	tests/testdata/linear_merging.4.gfa
	282	tests/testdata/linear_merging.4.gfa2
	283	tests/testdata/linear_merging.5.gfa
	284	tests/testdata/linear_merging.5.gfa2
	285	tests/testdata/linear_merging.6.gfa
	286	tests/testdata/linear_merging.6.merged.gfa
	287	tests/testdata/links_distri.l1.gfa
	288	tests/testdata/links_distri.l1.gfa2
	289	tests/testdata/links_distri.l1.m2.gfa
	290	tests/testdata/links_distri.l1.m2.gfa2
	291	tests/testdata/links_distri.l2.gfa
	292	tests/testdata/links_distri.l2.gfa2
	293	tests/testdata/links_distri.l2.m2.gfa
	294	tests/testdata/links_distri.l2.m2.gfa2
	295	tests/testdata/links_distri.l2.m2.no_ld.gfa
	296	tests/testdata/links_distri.l2.m2.no_ld.gfa2
	297	tests/testdata/links_distri.l2.m3.gfa
	298	tests/testdata/links_distri.l2.m3.gfa2
	299	tests/testdata/links_distri.l2.m3.no_ld.gfa
	300	tests/testdata/links_distri.l2.m3.no_ld.gfa2
	301	tests/testdata/links_distri.l3.gfa
	302	tests/testdata/links_distri.l3.gfa2
	303	tests/testdata/links_distri.l3.m2.gfa
	304	tests/testdata/links_distri.l3.m2.gfa2
	305	tests/testdata/links_distri.l3.m2.no_ld.gfa
	306	tests/testdata/links_distri.l3.m2.no_ld.gfa2
	307	tests/testdata/loop.gfa
	308	tests/testdata/loop.gfa2
	309	tests/testdata/rgfa_example.1.gfa
	310	tests/testdata/rgfa_example.2.gfa
	311	tests/testdata/sample.gfa
	312	tests/testdata/sample.gfa2
	313	tests/testdata/seq_to_fill.fas
	314	tests/testdata/spec_q1.gfa
	315	tests/testdata/spec_q1.gfa2
	316	tests/testdata/spec_q2.gfa
	317	tests/testdata/spec_q2.gfa2
	318	tests/testdata/spec_q2.path_circular.seq
	319	tests/testdata/spec_q2.path_linear.seq
	320	tests/testdata/spec_q3.gfa
	321	tests/testdata/spec_q3.gfa2
	322	tests/testdata/spec_q4.gfa
	323	tests/testdata/spec_q4.gfa2
	324	tests/testdata/spec_q4.path_more_than_circular.seq
	325	tests/testdata/spec_q5.gfa
	326	tests/testdata/spec_q5.gfa2
	327	tests/testdata/spec_q6.gfa
	328	tests/testdata/spec_q6.gfa2
	329	tests/testdata/spec_q7.gfa
	330	tests/testdata/spec_q7.gfa2
	331	tests/testdata/to_be_filled.gfa1
	332	tests/testdata/to_be_filled.gfa2
	333	tests/testdata/two_components.gfa
	334	tests/testdata/two_components.gfa2
	335	tests/testdata/unnamed_and_named_links.gfa
	336	tests/testdata/unnamed_link.gfa
	337	tests/testdata/valid_path.gfa2⏎

+1

-0

gfapy.egg-info/dependency_links.txt less more

0

+1

-0

gfapy.egg-info/not-zip-safe less more

0

+2

-0

gfapy.egg-info/top_level.txt less more

	0	gfapy
	1	tests

manual/gfapy-manual.pdf less more

Binary diff not shown

+5

-0

setup.cfg less more

0	0	[bdist_wheel]
1	1	python-tag = py3
	2
	3	[egg_info]
	4	tag_build =
	5	tag_date = 0
	6

+0

-32

~~tests/testdata/invalid/edge_missing.gfa2~~ less more

0		# File used for the collections test
1		# similar but NOT equivalent to the gfa1 file!
2		S 1 122 *
3		S 3 29 TGCTAGCTGACTGTCGATGCTGTGTG
4		E 1_to_2 1+ 2+ 110 122$ 0 12 12M
5		S 5 130 *
6		S 13 150 *
7		E 2_to_6 2+ 6+ 0 122$ 10 132 122M
8		O 14 11+ 12+
9		S 11 140 * xx:i:11
10		F 2 read1+ 0 42 12 55 * id:Z:read1_in_2
11		F 2 read2+ 45 62 0 18 * id:Z:read2_in_2
12		U 16 1 3 15 2_to_6 16sub
13		H ac:Z:test2
14		# another comment
15		S 12 150 *
16		S 4 120 *
17		H VN:Z:2.0
18		E 1_to_3 1+ 3+ 112 122$ 0 12 10M
19		G 1_to_11 1+ 11- 120 *
20		E 11_to_12 11+ 12+ 18 140$ 0 122 122M
21		S 6 150 *
22		X custom_record xx:Z:testtag
23		X custom_record X2
24		G 2_to_12 2- 12+ 500 50
25		O 15 11+ 11_to_13+ 13+ xx:i:-1
26		Y another_custom_record
27		U 16sub 2 3
28		S 2 120 * xx:Z:sometag
29		H aa:i:12 ab:Z:test1
30		H aa:i:15
31		E 1_to_5 1+ 5+ 0 122$ 2 124 * zz:Z:tag

+0

-12

~~tests/testdata/invalid/edge_wrong_lastpos.gfa2~~ less more

0		H VN:Z:2.0
1		H ul:Z:https://github.com/sjackman/assembly-graph/blob/master/sample.gfa
2		S 1 8 CGATGCAA
3		S 2 10 TGCAAAGTAC
4		S 3 21 TGCAACGTATAGACTTGTCAC RC:i:4
5		S 4 7 GCATATA
6		S 5 8 CGATGATA
7		S 6 4 ATGA
8		E * 1+ 2+ 3 9$ 0 5 5M
9		E * 3+ 2+ 21$ 21$ 0 0 0M
10		E * 3+ 4- 17 21$ 3 7$ 1M1D2M
11		E * 4- 5+ 0 0 0 0 0M

+0

-33

~~tests/testdata/invalid/fragment_wrong_lastpos.gfa2~~ less more

0		# File used for the collections test
1		# similar but NOT equivalent to the gfa1 file!
2		S 1 122 *
3		S 3 29 TGCTAGCTGACTGTCGATGCTGTGTG
4		E 1_to_2 1+ 2+ 110 122$ 0 12 12M
5		S 5 130 *
6		S 13 150 *
7		E 2_to_6 2+ 6+ 0 122$ 10 132 122M
8		O 14 11+ 12+
9		S 11 140 * xx:i:11
10		F 3 read1+ 0 42$ 12 55 * id:Z:read1_in_3
11		F 2 read2+ 45 62 0 18 * id:Z:read2_in_2
12		U 16 1 3 15 2_to_6 16sub
13		H ac:Z:test2
14		# another comment
15		S 12 150 *
16		S 4 120 *
17		H VN:Z:2.0
18		E 1_to_3 1+ 3+ 112 122$ 0 12 10M
19		G 1_to_11 1+ 11- 120 *
20		E 11_to_12 11+ 12+ 18 140$ 0 122 122M
21		S 6 150 *
22		X custom_record xx:Z:testtag
23		X custom_record X2
24		E 11_to_13 11+ 13+ 20 140$ 0 120 120M
25		G 2_to_12 2- 12+ 500 50
26		O 15 11+ 11_to_13+ 13+ xx:i:-1
27		Y another_custom_record
28		U 16sub 2 3
29		S 2 120 * xx:Z:sometag
30		H aa:i:12 ab:Z:test1
31		H aa:i:15
32		E 1_to_5 1+ 5+ 0 122$ 2 124 * zz:Z:tag

+0

-12

~~tests/testdata/invalid/inconsistent_length.gfa1~~ less more

0		H VN:Z:1.0
1		H ul:Z:https://github.com/sjackman/assembly-graph/blob/master/sample.gfa
2		S 1 CGATGCAA LN:i:12
3		S 2 TGCAAAGTAC
4		S 3 TGCAACGTATAGACTTGTCAC RC:i:4
5		S 4 GCATATA
6		S 5 CGATGATA
7		S 6 ATGA
8		L 1 + 2 + 5M
9		L 3 + 2 + 0M
10		L 3 + 4 - 1M1D2M1S
11		L 4 - 5 + 0M

+0

-21

~~tests/testdata/invalid/link_missing.gfa1~~ less more

0		# File used for the collections test
1		S 1 *
2		S 3 CGATGCTAGCTGACTGTCGATGCTGTGTG
3		L 1 + 2 + 12M ID:Z:1_to_2
4		S 5 *
5		S 13 *
6		C 2 + 6 + 10 122M ID:Z:2_to_6
7		P 14 11+,12+ 122M
8		S 11 *
9		H ac:Z:test2
10		S 12 *
11		S 4 *
12		H VN:Z:1.0
13		L 1 + 3 + 12M ID:Z:1_to_3
14		S 6 *
15		L 11 + 13 + 120M ID:Z:11_to_13
16		P 15 11+,13+ 120M
17		S 2 * xx:Z:sometag
18		H aa:i:12 ab:Z:test1
19		H aa:i:15
20		C 1 + 5 + 12 120M ID:Z:1_to_5

+0

-21

~~tests/testdata/invalid/segment_missing.gfa1~~ less more

0		# comment
1		S 3 CGATGCTAGCTGACTGTCGATGCTGTGTG
2		L 1 + 2 + 12M ID:Z:1_to_2
3		S 5 *
4		S 13 *
5		C 2 + 6 + 10 122M ID:Z:2_to_6
6		P 14 11+,12+ 122M
7		S 11 *
8		H ac:Z:test2
9		S 12 *
10		S 4 *
11		H VN:Z:1.0
12		L 1 + 3 + 12M ID:Z:1_to_3
13		L 11 + 12 + 122M ID:Z:11_to_12
14		S 6 *
15		L 11 + 13 + 120M ID:Z:11_to_13
16		P 15 11+,13+ 120M
17		S 2 * xx:Z:sometag
18		H aa:i:12 ab:Z:test1
19		H aa:i:15
20		C 1 + 5 + 12 120M ID:Z:1_to_5

+0

-32

~~tests/testdata/invalid/segment_missing.gfa2~~ less more

0		# File used for the collections test
1		# similar but NOT equivalent to the gfa1 file!
2		S 3 29 TGCTAGCTGACTGTCGATGCTGTGTG
3		E 1_to_2 1+ 2+ 110 122$ 0 12 12M
4		S 5 130 *
5		S 13 150 *
6		E 2_to_6 2+ 6+ 0 122$ 10 132 122M
7		O 14 11+ 12+
8		S 11 140 * xx:i:11
9		F 2 read1+ 0 42 12 55 * id:Z:read1_in_2
10		F 2 read2+ 45 62 0 18 * id:Z:read2_in_2
11		U 16 1 3 15 2_to_6 16sub
12		H ac:Z:test2
13		# another comment
14		S 12 150 *
15		S 4 120 *
16		H VN:Z:2.0
17		E 1_to_3 1+ 3+ 112 122$ 0 12 10M
18		G 1_to_11 1+ 11- 120 *
19		E 11_to_12 11+ 12+ 18 140$ 0 122 122M
20		S 6 150 *
21		X custom_record xx:Z:testtag
22		X custom_record X2
23		E 11_to_13 11+ 13+ 20 140$ 0 120 120M
24		G 2_to_12 2- 12+ 500 50
25		O 15 11+ 11_to_13+ 13+ xx:i:-1
26		Y another_custom_record
27		U 16sub 2 3
28		S 2 120 * xx:Z:sometag
29		H aa:i:12 ab:Z:test1
30		H aa:i:15
31		E 1_to_5 1+ 5+ 0 122$ 2 124 * zz:Z:tag