New Upstream Release - minimap2

Ready changes

Summary

Merged new upstream version: 2.26+dfsg (was: 2.24+dfsg).

Resulting package

Built on 2023-05-01T18:07 (took 17m2s)

The resulting binary packages can be installed (if you have the apt repository enabled) by running one of:

apt install -t fresh-releases libminimap2-devapt install -t fresh-releases minimap2-dbgsymapt install -t fresh-releases minimap2apt install -t fresh-releases python3-mappy-dbgsymapt install -t fresh-releases python3-mappy

Lintian Result

Diff

diff --git a/.travis.yml b/.travis.yml
deleted file mode 100644
index b43ca9c..0000000
--- a/.travis.yml
+++ /dev/null
@@ -1,24 +0,0 @@
-matrix:
-  include:
-    - language: c
-      compiler: gcc
-      script: make
-    - language: c
-      compiler: clang
-      script: make
-    - arch: arm64
-      language: c
-      compiler: gcc
-      script: make arm_neon=1 aarch64=1
-    - language: python
-      python: "2.7"
-      before_install: pip install cython
-      script: python setup.py build_ext
-    - language: python
-      python: "3.5"
-      before_install: pip install cython
-      script: python setup.py build_ext
-    - language: python
-      python: "3.9"
-      before_install: pip install cython
-      script: python setup.py build_ext
diff --git a/Makefile b/Makefile
index 4118616..17b13b6 100644
--- a/Makefile
+++ b/Makefile
@@ -8,6 +8,10 @@ PROG=		minimap2
 PROG_EXTRA=	sdust minimap2-lite
 LIBS=		-lm -lz -lpthread
 
+ifneq ($(aarch64),)
+	arm_neon=1
+endif
+
 ifeq ($(arm_neon),) # if arm_neon is not defined
 ifeq ($(sse2only),) # if sse2only is not defined
 	OBJS+=ksw2_extz2_sse41.o ksw2_extd2_sse41.o ksw2_exts2_sse41.o ksw2_extz2_sse2.o ksw2_extd2_sse2.o ksw2_exts2_sse2.o ksw2_dispatch.o
@@ -26,12 +30,12 @@ endif
 
 ifneq ($(asan),)
 	CFLAGS+=-fsanitize=address
-	LIBS+=-fsanitize=address
+	LIBS+=-fsanitize=address -ldl
 endif
 
 ifneq ($(tsan),)
 	CFLAGS+=-fsanitize=thread
-	LIBS+=-fsanitize=thread
+	LIBS+=-fsanitize=thread -ldl
 endif
 
 .PHONY:all extra clean depend
diff --git a/NEWS.md b/NEWS.md
index 5de4ea6..ff0dfbd 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,3 +1,50 @@
+Release 2.26-r1175 (29 April 2023)
+----------------------------------
+
+Fixed the broken Python package. This is the only change.
+
+(2.25: 25 April 2023, r1173)
+
+
+
+Release 2.25-r1173 (25 April 2023)
+----------------------------------
+
+Notable changes:
+
+ * Improvement: use the miniprot splice model for RNA-seq alignment by default.
+   This model considers non-GT-AG splice sites and leads to slightly higher
+   (<0.1%) accuracy and sensitivity on real human data.
+
+ * Change: increased the default `-I` to `8G` such that minimap2 would create a
+   uni-part index for a pair of mammalian genomes. This change may increase the
+   memory for all-vs-all read overlap alignment given large datasets.
+
+ * New feature: output the sequences in secondary alignments with option
+   `--secondary-seq` (#687).
+
+ * Bugfix: --rmq was not parsed correctly (#1010)
+
+ * Bugfix: possibly incorrect coordinate when applying end bonus to the target
+   sequence (#1025). This is a ksw2 bug. It does not affect minimap2 as
+   minimap2 is not using the affected feature.
+
+ * Improvement: incorporated several changes for better compatibility with
+   Windows (#1051) and for minimap2 integration at Oxford Nanopore Technologies
+   (#1048 and #1033).
+
+ * Improvement: output the HD-line in SAM output (#1019).
+
+ * Improvement: check minimap2 index file in mappy to prevent segmentation
+   fault for certain indices (#1008).
+
+For genomic sequences, minimap2 should give identical output to v2.24.
+Long-read RNA-seq alignment may occasionally differ from previous versions.
+
+(2.25: 25 April 2023, r1173)
+
+
+
 Release 2.24-r1122 (26 December 2021)
 -------------------------------------
 
diff --git a/README.md b/README.md
index 13c4938..4bfdae1 100644
--- a/README.md
+++ b/README.md
@@ -74,8 +74,8 @@ Detailed evaluations are available from the [minimap2 paper][doi] or the
 Minimap2 is optimized for x86-64 CPUs. You can acquire precompiled binaries from
 the [release page][release] with:
 ```sh
-curl -L https://github.com/lh3/minimap2/releases/download/v2.24/minimap2-2.24_x64-linux.tar.bz2 | tar -jxvf -
-./minimap2-2.24_x64-linux/minimap2
+curl -L https://github.com/lh3/minimap2/releases/download/v2.26/minimap2-2.26_x64-linux.tar.bz2 | tar -jxvf -
+./minimap2-2.26_x64-linux/minimap2
 ```
 If you want to compile from the source, you need to have a C compiler, GNU make
 and zlib development files installed. Then type `make` in the source code
@@ -350,6 +350,11 @@ If you use minimap2 in your work, please cite:
 > Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences.
 > *Bioinformatics*, **34**:3094-3100. [doi:10.1093/bioinformatics/bty191][doi]
 
+and/or:
+
+> Li, H. (2021). New strategies to improve minimap2 alignment accuracy.
+> *Bioinformatics*, **37**:4572-4574. [doi:10.1093/bioinformatics/btab705][doi2]
+
 ## <a name="dguide"></a>Developers' Guide
 
 Minimap2 is not only a command line tool, but also a programming library.
@@ -399,5 +404,6 @@ mappy` or [from BioConda][mappyconda] via `conda install -c bioconda mappy`.
 [manpage]: https://lh3.github.io/minimap2/minimap2.html
 [manpage-cs]: https://lh3.github.io/minimap2/minimap2.html#10
 [doi]: https://doi.org/10.1093/bioinformatics/bty191
-[smide]: https://github.com/nemequ/simde
+[doi2]: https://doi.org/10.1093/bioinformatics/btab705
+[simde]: https://github.com/nemequ/simde
 [unimap]: https://github.com/lh3/unimap
diff --git a/align.c b/align.c
index ddbb0bd..316af78 100644
--- a/align.c
+++ b/align.c
@@ -326,9 +326,11 @@ static void mm_align_pair(void *km, const mm_mapopt_t *opt, int qlen, const uint
 	if (opt->max_sw_mat > 0 && (int64_t)tlen * qlen > opt->max_sw_mat) {
 		ksw_reset_extz(ez);
 		ez->zdropped = 1;
-	} else if (opt->flag & MM_F_SPLICE)
-		ksw_exts2_sse(km, qlen, qseq, tlen, tseq, 5, mat, opt->q, opt->e, opt->q2, opt->noncan, zdrop, opt->junc_bonus, flag, junc, ez);
-	else if (opt->q == opt->q2 && opt->e == opt->e2)
+	} else if (opt->flag & MM_F_SPLICE) {
+		int flag_tmp = flag;
+		if (!(opt->flag & MM_F_SPLICE_OLD)) flag_tmp |= KSW_EZ_SPLICE_CMPLX;
+		ksw_exts2_sse(km, qlen, qseq, tlen, tseq, 5, mat, opt->q, opt->e, opt->q2, opt->noncan, zdrop, opt->junc_bonus, flag_tmp, junc, ez);
+	} else if (opt->q == opt->q2 && opt->e == opt->e2)
 		ksw_extz2_sse(km, qlen, qseq, tlen, tseq, 5, mat, opt->q, opt->e, w, zdrop, end_bonus, flag, ez);
 	else
 		ksw_extd2_sse(km, qlen, qseq, tlen, tseq, 5, mat, opt->q, opt->e, opt->q2, opt->e2, w, zdrop, end_bonus, flag, ez);
diff --git a/cookbook.md b/cookbook.md
index 8b3c105..428f117 100644
--- a/cookbook.md
+++ b/cookbook.md
@@ -31,8 +31,8 @@ To acquire the data used in this cookbook and to install minimap2 and paftools,
 please follow the command lines below:
 ```sh
 # install minimap2 executables
-curl -L https://github.com/lh3/minimap2/releases/download/v2.24/minimap2-2.24_x64-linux.tar.bz2 | tar jxf -
-cp minimap2-2.24_x64-linux/{minimap2,k8,paftools.js} .  # copy executables
+curl -L https://github.com/lh3/minimap2/releases/download/v2.26/minimap2-2.26_x64-linux.tar.bz2 | tar jxf -
+cp minimap2-2.26_x64-linux/{minimap2,k8,paftools.js} .  # copy executables
 export PATH="$PATH:"`pwd`                               # put the current directory on PATH
 # download example datasets
 curl -L https://github.com/lh3/minimap2/releases/download/v2.10/cookbook-data.tgz | tar zxf -
diff --git a/debian/changelog b/debian/changelog
index 473ed03..3f02021 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+minimap2 (2.26+dfsg-1) UNRELEASED; urgency=low
+
+  * New upstream release.
+
+ -- Debian Janitor <janitor@jelmer.uk>  Mon, 01 May 2023 17:51:00 -0000
+
 minimap2 (2.24+dfsg-3) unstable; urgency=medium
 
   * Fix watch file
diff --git a/debian/patches/ar.patch b/debian/patches/ar.patch
index 3ddc3de..e95b84f 100644
--- a/debian/patches/ar.patch
+++ b/debian/patches/ar.patch
@@ -2,9 +2,11 @@ Author: Steffen Moeller
 Last-Update: 2020-03-15 04:06:20 +0100
 Description: Fix ar call to support debugging symbols properly
 
---- minimap2.orig/Makefile
-+++ minimap2/Makefile
-@@ -51,7 +51,8 @@
+Index: minimap2.git/Makefile
+===================================================================
+--- minimap2.git.orig/Makefile
++++ minimap2.git/Makefile
+@@ -55,7 +55,8 @@ minimap2-lite:example.o libminimap2.a
  		$(CC) $(CFLAGS) $< -o $@ -L. -lminimap2 $(LIBS) $(LDFLAGS)
  
  libminimap2.a:$(OBJS)
diff --git a/debian/patches/do_not_use_natbib.bst.patch b/debian/patches/do_not_use_natbib.bst.patch
index 405114f..d7a647c 100644
--- a/debian/patches/do_not_use_natbib.bst.patch
+++ b/debian/patches/do_not_use_natbib.bst.patch
@@ -2,8 +2,10 @@ Author: Andreas Tille <tille@debian.org>
 Last-Update: Fri, 09 Nov 2018 23:36:05 +0100
 Description: Natbib style is only for non-profit use
 Forwarded: not-needed
---- a/tex/bioinfo.cls
-+++ b/tex/bioinfo.cls
+Index: minimap2.git/tex/bioinfo.cls
+===================================================================
+--- minimap2.git.orig/tex/bioinfo.cls
++++ minimap2.git/tex/bioinfo.cls
 @@ -824,7 +824,8 @@
        \setlength{\itemindent}{-4mm}}\small}
  {\end{list}\endgroup}
diff --git a/debian/patches/hardening.patch b/debian/patches/hardening.patch
index 572b28c..b90b0bb 100644
--- a/debian/patches/hardening.patch
+++ b/debian/patches/hardening.patch
@@ -2,8 +2,10 @@ Author: Andreas Tille <tille@debian.org>
 Last-Update: Fri, 17 Aug 2018 11:04:09 +0200
 Description: Propagate hardening options
 
---- minimap2.orig/Makefile
-+++ minimap2/Makefile
+Index: minimap2.git/Makefile
+===================================================================
+--- minimap2.git.orig/Makefile
++++ minimap2.git/Makefile
 @@ -1,5 +1,5 @@
 -CFLAGS=		-g -Wall -O2 -Wc++-compat #-Wextra
 -CPPFLAGS=	-DHAVE_KALLOC
@@ -12,7 +14,7 @@ Description: Propagate hardening options
  INCLUDES=
  OBJS=		kthread.o kalloc.o misc.o bseq.o sketch.o sdust.o options.o index.o \
  			lchain.o align.o hit.o seed.o map.o format.o pe.o esterr.o splitidx.o \
-@@ -45,16 +45,16 @@
+@@ -49,16 +49,16 @@ all:$(PROG)
  extra:all $(PROG_EXTRA)
  
  minimap2:main.o libminimap2.a
diff --git a/debian/patches/link_mappy_to_libminimap.patch b/debian/patches/link_mappy_to_libminimap.patch
index 77aa40b..0d1839d 100644
--- a/debian/patches/link_mappy_to_libminimap.patch
+++ b/debian/patches/link_mappy_to_libminimap.patch
@@ -3,9 +3,11 @@ Last-Update: Fri, 15 May 2020 14:00:54 +0200
 Description: Add libminimap to Python library
  FIXME: seems extra_link_args and extra_objects are ignored both
 
---- minimap2.orig/setup.py
-+++ minimap2/setup.py
-@@ -9,6 +9,8 @@
+Index: minimap2.git/setup.py
+===================================================================
+--- minimap2.git.orig/setup.py
++++ minimap2.git/setup.py
+@@ -9,6 +9,8 @@ import sys, platform
  sys.path.append('python')
  
  extra_compile_args = ['-DHAVE_KALLOC']
diff --git a/debian/patches/python-sse4-arch.patch b/debian/patches/python-sse4-arch.patch
index 1b6d2f7..f124163 100644
--- a/debian/patches/python-sse4-arch.patch
+++ b/debian/patches/python-sse4-arch.patch
@@ -3,9 +3,11 @@ From: Michael Hudson-Doyle <michael.hudson@canonical.com>
 Bug-Debian: https://bugs.debian.org/969596
 Description: Only pass -msse4.1 to the compiler on amd64
 
---- minimap2.orig/setup.py
-+++ minimap2/setup.py
-@@ -5,6 +5,7 @@
+Index: minimap2.git/setup.py
+===================================================================
+--- minimap2.git.orig/setup.py
++++ minimap2.git/setup.py
+@@ -5,6 +5,7 @@ except ImportError:
  	from distutils.extension import Extension
  
  import sys, platform
@@ -13,7 +15,7 @@ Description: Only pass -msse4.1 to the compiler on amd64
  
  sys.path.append('python')
  
-@@ -13,11 +14,11 @@
+@@ -13,11 +14,11 @@ extra_compile_args = ['-DHAVE_KALLOC']
  extra_objects = ['libminimap2.a']
  include_dirs = ["."]
  
diff --git a/debian/patches/reenable-simde.patch b/debian/patches/reenable-simde.patch
index f35b363..45bfe32 100644
--- a/debian/patches/reenable-simde.patch
+++ b/debian/patches/reenable-simde.patch
@@ -1,5 +1,7 @@
---- a/ksw2_extz2_sse.c
-+++ b/ksw2_extz2_sse.c
+Index: minimap2.git/ksw2_extz2_sse.c
+===================================================================
+--- minimap2.git.orig/ksw2_extz2_sse.c
++++ minimap2.git/ksw2_extz2_sse.c
 @@ -2,31 +2,13 @@
  #include <assert.h>
  #include "ksw2.h"
@@ -36,7 +38,7 @@
  #else
  void ksw_extz2_sse(void *km, int qlen, const uint8_t *query, int tlen, const uint8_t *target, int8_t m, const int8_t *mat, int8_t q, int8_t e, int w, int zdrop, int end_bonus, int flag, ksw_extz_t *ez)
  #endif // ~KSW_CPU_DISPATCH
-@@ -137,13 +119,8 @@
+@@ -137,13 +119,8 @@ void ksw_extz2_sse(void *km, int qlen, c
  				st = _mm_loadu_si128((__m128i*)&qrr[t]);
  				mask = _mm_or_si128(_mm_cmpeq_epi8(sq, m1_), _mm_cmpeq_epi8(st, m1_));
  				tmp = _mm_cmpeq_epi8(sq, st);
@@ -50,7 +52,7 @@
  				_mm_storeu_si128((__m128i*)((uint8_t*)s + t), tmp);
  			}
  		} else {
-@@ -159,22 +136,10 @@
+@@ -159,22 +136,10 @@ void ksw_extz2_sse(void *km, int qlen, c
  			for (t = st_; t <= en_; ++t) {
  				__m128i z, a, b, xt1, vt1, ut, tmp;
  				__dp_code_block1;
@@ -73,7 +75,7 @@
  			}
  		} else if (!(flag&KSW_EZ_RIGHT)) { // gap left-alignment
  			__m128i *pr = p + (size_t)r * n_col_ - st_;
-@@ -183,16 +148,9 @@
+@@ -183,16 +148,9 @@ void ksw_extz2_sse(void *km, int qlen, c
  				__m128i d, z, a, b, xt1, vt1, ut, tmp;
  				__dp_code_block1;
  				d = _mm_and_si128(_mm_cmpgt_epi8(a, z), flag1_); // d = a > z? 1 : 0
@@ -90,7 +92,7 @@
  				__dp_code_block2;
  				tmp = _mm_cmpgt_epi8(a, zero_);
  				_mm_store_si128(&x[t], _mm_and_si128(tmp, a));
-@@ -209,16 +167,9 @@
+@@ -209,16 +167,9 @@ void ksw_extz2_sse(void *km, int qlen, c
  				__m128i d, z, a, b, xt1, vt1, ut, tmp;
  				__dp_code_block1;
  				d = _mm_andnot_si128(_mm_cmpgt_epi8(z, a), flag1_); // d = z > a? 0 : 1
@@ -107,7 +109,7 @@
  				__dp_code_block2;
  				tmp = _mm_cmpgt_epi8(zero_, a);
  				_mm_store_si128(&x[t], _mm_andnot_si128(tmp, a));
-@@ -249,13 +200,8 @@
+@@ -249,13 +200,8 @@ void ksw_extz2_sse(void *km, int qlen, c
  					_mm_storeu_si128((__m128i*)&H[t], H1);
  					t_ = _mm_set1_epi32(t);
  					tmp = _mm_cmpgt_epi32(H1, max_H_);
@@ -121,14 +123,16 @@
  				}
  				_mm_storeu_si128((__m128i*)HH, max_H_);
  				_mm_storeu_si128((__m128i*)tt, max_t_);
-@@ -310,4 +256,4 @@
+@@ -310,4 +256,4 @@ void ksw_extz2_sse(void *km, int qlen, c
  		kfree(km, mem2); kfree(km, off);
  	}
  }
 -#endif // __SSE2__
 +
---- a/ksw2_extd2_sse.c
-+++ b/ksw2_extd2_sse.c
+Index: minimap2.git/ksw2_extd2_sse.c
+===================================================================
+--- minimap2.git.orig/ksw2_extd2_sse.c
++++ minimap2.git/ksw2_extd2_sse.c
 @@ -3,37 +3,19 @@
  #include <assert.h>
  #include "ksw2.h"
@@ -172,7 +176,7 @@
  {
  #define __dp_code_block1 \
  	z = _mm_load_si128(&s[t]); \
-@@ -169,13 +151,8 @@
+@@ -169,13 +151,8 @@ void ksw_extd2_sse(void *km, int qlen, c
  				st = _mm_loadu_si128((__m128i*)&qrr[t]);
  				mask = _mm_or_si128(_mm_cmpeq_epi8(sq, m1_), _mm_cmpeq_epi8(st, m1_));
  				tmp = _mm_cmpeq_epi8(sq, st);
@@ -186,7 +190,7 @@
  				_mm_storeu_si128((__m128i*)((int8_t*)s + t), tmp);
  			}
  		} else {
-@@ -192,7 +169,6 @@
+@@ -192,7 +169,6 @@ void ksw_extd2_sse(void *km, int qlen, c
  			for (t = st_; t <= en_; ++t) {
  				__m128i z, a, b, a2, b2, xt1, x2t1, vt1, ut, tmp;
  				__dp_code_block1;
@@ -194,7 +198,7 @@
  				z = _mm_max_epi8(z, a);
  				z = _mm_max_epi8(z, b);
  				z = _mm_max_epi8(z, a2);
-@@ -203,27 +179,6 @@
+@@ -203,27 +179,6 @@ void ksw_extd2_sse(void *km, int qlen, c
  				_mm_store_si128(&y[t],  _mm_sub_epi8(_mm_max_epi8(b,  zero_), qe_));
  				_mm_store_si128(&x2[t], _mm_sub_epi8(_mm_max_epi8(a2, zero_), qe2_));
  				_mm_store_si128(&y2[t], _mm_sub_epi8(_mm_max_epi8(b2, zero_), qe2_));
@@ -222,7 +226,7 @@
  			}
  		} else if (!(flag&KSW_EZ_RIGHT)) { // gap left-alignment
  			__m128i *pr = p + (size_t)r * n_col_ - st_;
-@@ -231,7 +186,6 @@
+@@ -231,7 +186,6 @@ void ksw_extd2_sse(void *km, int qlen, c
  			for (t = st_; t <= en_; ++t) {
  				__m128i d, z, a, b, a2, b2, xt1, x2t1, vt1, ut, tmp;
  				__dp_code_block1;
@@ -230,7 +234,7 @@
  				d = _mm_and_si128(_mm_cmpgt_epi8(a, z), _mm_set1_epi8(1));       // d = a  > z? 1 : 0
  				z = _mm_max_epi8(z, a);
  				d = _mm_blendv_epi8(d, _mm_set1_epi8(2), _mm_cmpgt_epi8(b,  z)); // d = b  > z? 2 : d
-@@ -241,22 +195,6 @@
+@@ -241,22 +195,6 @@ void ksw_extd2_sse(void *km, int qlen, c
  				d = _mm_blendv_epi8(d, _mm_set1_epi8(4), _mm_cmpgt_epi8(b2, z)); // d = a2 > z? 3 : d
  				z = _mm_max_epi8(z, b2);
  				z = _mm_min_epi8(z, sc_mch_);
@@ -253,7 +257,7 @@
  				__dp_code_block2;
  				tmp = _mm_cmpgt_epi8(a, zero_);
  				_mm_store_si128(&x[t],  _mm_sub_epi8(_mm_and_si128(tmp, a),  qe_));
-@@ -278,7 +216,6 @@
+@@ -278,7 +216,6 @@ void ksw_extd2_sse(void *km, int qlen, c
  			for (t = st_; t <= en_; ++t) {
  				__m128i d, z, a, b, a2, b2, xt1, x2t1, vt1, ut, tmp;
  				__dp_code_block1;
@@ -261,7 +265,7 @@
  				d = _mm_andnot_si128(_mm_cmpgt_epi8(z, a), _mm_set1_epi8(1));    // d = z > a?  0 : 1
  				z = _mm_max_epi8(z, a);
  				d = _mm_blendv_epi8(_mm_set1_epi8(2), d, _mm_cmpgt_epi8(z, b));  // d = z > b?  d : 2
-@@ -288,22 +225,6 @@
+@@ -288,22 +225,6 @@ void ksw_extd2_sse(void *km, int qlen, c
  				d = _mm_blendv_epi8(_mm_set1_epi8(4), d, _mm_cmpgt_epi8(z, b2)); // d = z > b2? d : 4
  				z = _mm_max_epi8(z, b2);
  				z = _mm_min_epi8(z, sc_mch_);
@@ -284,7 +288,7 @@
  				__dp_code_block2;
  				tmp = _mm_cmpgt_epi8(zero_, a);
  				_mm_store_si128(&x[t],  _mm_sub_epi8(_mm_andnot_si128(tmp, a),  qe_));
-@@ -338,13 +259,8 @@
+@@ -338,13 +259,8 @@ void ksw_extd2_sse(void *km, int qlen, c
  					_mm_storeu_si128((__m128i*)&H[t], H1);
  					t_ = _mm_set1_epi32(t);
  					tmp = _mm_cmpgt_epi32(H1, max_H_);
@@ -298,13 +302,15 @@
  				}
  				_mm_storeu_si128((__m128i*)HH, max_H_);
  				_mm_storeu_si128((__m128i*)tt, max_t_);
-@@ -399,4 +315,3 @@
+@@ -399,4 +315,3 @@ void ksw_extd2_sse(void *km, int qlen, c
  		kfree(km, mem2); kfree(km, off);
  	}
  }
 -#endif // __SSE2__
---- a/ksw2_exts2_sse.c
-+++ b/ksw2_exts2_sse.c
+Index: minimap2.git/ksw2_exts2_sse.c
+===================================================================
+--- minimap2.git.orig/ksw2_exts2_sse.c
++++ minimap2.git/ksw2_exts2_sse.c
 @@ -3,36 +3,19 @@
  #include <assert.h>
  #include "ksw2.h"
@@ -347,7 +353,7 @@
  {
  #define __dp_code_block1 \
  	z = _mm_load_si128(&s[t]); \
-@@ -201,13 +184,8 @@
+@@ -240,13 +223,8 @@ void ksw_exts2_sse(void *km, int qlen, c
  				st = _mm_loadu_si128((__m128i*)&qrr[t]);
  				mask = _mm_or_si128(_mm_cmpeq_epi8(sq, m1_), _mm_cmpeq_epi8(st, m1_));
  				tmp = _mm_cmpeq_epi8(sq, st);
@@ -361,7 +367,7 @@
  				_mm_storeu_si128((__m128i*)((int8_t*)s + t), tmp);
  			}
  		} else {
-@@ -224,7 +202,6 @@
+@@ -263,7 +241,6 @@ void ksw_exts2_sse(void *km, int qlen, c
  			for (t = st_; t <= en_; ++t) {
  				__m128i z, a, b, a2, a2a, xt1, x2t1, vt1, ut, tmp;
  				__dp_code_block1;
@@ -369,7 +375,7 @@
  				z = _mm_max_epi8(z, a);
  				z = _mm_max_epi8(z, b);
  				z = _mm_max_epi8(z, a2a);
-@@ -233,23 +210,6 @@
+@@ -272,23 +249,6 @@ void ksw_exts2_sse(void *km, int qlen, c
  				_mm_store_si128(&y[t],  _mm_sub_epi8(_mm_max_epi8(b,  zero_), qe_));
  				tmp = _mm_load_si128(&donor[t]);
  				_mm_store_si128(&x2[t], _mm_sub_epi8(_mm_max_epi8(a2, tmp), q2_));
@@ -393,7 +399,7 @@
  			}
  		} else if (!(flag&KSW_EZ_RIGHT)) { // gap left-alignment
  			__m128i *pr = p + r * n_col_ - st_;
-@@ -257,24 +217,12 @@
+@@ -296,24 +256,12 @@ void ksw_exts2_sse(void *km, int qlen, c
  			for (t = st_; t <= en_; ++t) {
  				__m128i d, z, a, b, a2, a2a, xt1, x2t1, vt1, ut, tmp, tmp2;
  				__dp_code_block1;
@@ -418,7 +424,7 @@
  				__dp_code_block2;
  				tmp = _mm_cmpgt_epi8(a, zero_);
  				_mm_store_si128(&x[t],  _mm_sub_epi8(_mm_and_si128(tmp, a),  qe_));
-@@ -285,11 +233,7 @@
+@@ -324,11 +272,7 @@ void ksw_exts2_sse(void *km, int qlen, c
  
  				tmp2 = _mm_load_si128(&donor[t]);
  				tmp = _mm_cmpgt_epi8(a2, tmp2);
@@ -430,7 +436,7 @@
  				_mm_store_si128(&x2[t], _mm_sub_epi8(tmp2, q2_));
  				d = _mm_or_si128(d, _mm_and_si128(tmp, _mm_set1_epi8(0x20)));
  				_mm_store_si128(&pr[t], d);
-@@ -300,24 +244,12 @@
+@@ -339,24 +283,12 @@ void ksw_exts2_sse(void *km, int qlen, c
  			for (t = st_; t <= en_; ++t) {
  				__m128i d, z, a, b, a2, a2a, xt1, x2t1, vt1, ut, tmp, tmp2;
  				__dp_code_block1;
@@ -455,7 +461,7 @@
  				__dp_code_block2;
  				tmp = _mm_cmpgt_epi8(zero_, a);
  				_mm_store_si128(&x[t],  _mm_sub_epi8(_mm_andnot_si128(tmp, a),  qe_));
-@@ -328,11 +260,7 @@
+@@ -367,11 +299,7 @@ void ksw_exts2_sse(void *km, int qlen, c
  
  				tmp2 = _mm_load_si128(&donor[t]);
  				tmp = _mm_cmpgt_epi8(tmp2, a2);
@@ -467,7 +473,7 @@
  				_mm_store_si128(&x2[t], _mm_sub_epi8(tmp2, q2_));
  				d = _mm_or_si128(d, _mm_andnot_si128(tmp, _mm_set1_epi8(0x20))); // d = a > 0? 1<<5 : 0
  				_mm_store_si128(&pr[t], d);
-@@ -356,13 +284,8 @@
+@@ -395,13 +323,8 @@ void ksw_exts2_sse(void *km, int qlen, c
  					_mm_storeu_si128((__m128i*)&H[t], H1);
  					t_ = _mm_set1_epi32(t);
  					tmp = _mm_cmpgt_epi32(H1, max_H_);
@@ -481,7 +487,7 @@
  				}
  				_mm_storeu_si128((__m128i*)HH, max_H_);
  				_mm_storeu_si128((__m128i*)tt, max_t_);
-@@ -413,4 +336,3 @@
+@@ -452,4 +375,3 @@ void ksw_exts2_sse(void *km, int qlen, c
  		kfree(km, mem2); kfree(km, off);
  	}
  }
diff --git a/debian/patches/simde b/debian/patches/simde
index 32d9b61..448fedd 100644
--- a/debian/patches/simde
+++ b/debian/patches/simde
@@ -1,5 +1,7 @@
---- minimap2.orig/Makefile.simde
-+++ minimap2/Makefile.simde
+Index: minimap2.git/Makefile.simde
+===================================================================
+--- minimap2.git.orig/Makefile.simde
++++ minimap2.git/Makefile.simde
 @@ -1,8 +1,8 @@
 -CFLAGS=		-g -Wall -O2 -Wc++-compat #-Wextra
 -CPPFLAGS=	-DHAVE_KALLOC -DUSE_SIMDE -DSIMDE_ENABLE_NATIVE_ALIASES
@@ -14,7 +16,7 @@
  PROG=		minimap2
  PROG_EXTRA=	sdust minimap2-lite
  LIBS=		-lm -lz -lpthread
-@@ -37,28 +37,27 @@
+@@ -37,28 +37,27 @@ all:$(PROG)
  extra:all $(PROG_EXTRA)
  
  minimap2:main.o libminimap2.a
@@ -55,7 +57,7 @@
  
  # other non-file targets
  
-@@ -80,9 +79,6 @@
+@@ -80,9 +79,6 @@ hit.o: mmpriv.h minimap.h bseq.h kseq.h
  index.o: kthread.h bseq.h minimap.h mmpriv.h kseq.h kvec.h kalloc.h khash.h
  index.o: ksort.h
  kalloc.o: kalloc.h
diff --git a/format.c b/format.c
index ac7cd8e..fc03f00 100644
--- a/format.c
+++ b/format.c
@@ -119,6 +119,7 @@ int mm_write_sam_hdr(const mm_idx_t *idx, const char *rg, const char *ver, int a
 {
 	kstring_t str = {0,0,0};
 	int ret = 0;
+	mm_sprintf_lite(&str, "@HD\tVN:1.6\tSO:unsorted\tGO:query\n");
 	if (idx) {
 		uint32_t i;
 		for (i = 0; i < idx->n_seq; ++i)
@@ -369,14 +370,16 @@ static void write_sam_cigar(kstring_t *s, int sam_flag, int in_tag, int qlen, co
 		clip_len[0] = r->rev? qlen - r->qe : r->qs;
 		clip_len[1] = r->rev? r->qs : qlen - r->qe;
 		if (in_tag) {
-			int clip_char = (sam_flag&0x800) && !(opt_flag&MM_F_SOFTCLIP)? 5 : 4;
+			int clip_char = (((sam_flag&0x800) || ((sam_flag&0x100) && (opt_flag&MM_F_SECONDARY_SEQ))) &&
+							 !(opt_flag&MM_F_SOFTCLIP)) ? 5 : 4;
 			mm_sprintf_lite(s, "\tCG:B:I");
 			if (clip_len[0]) mm_sprintf_lite(s, ",%u", clip_len[0]<<4|clip_char);
 			for (k = 0; k < r->p->n_cigar; ++k)
 				mm_sprintf_lite(s, ",%u", r->p->cigar[k]);
 			if (clip_len[1]) mm_sprintf_lite(s, ",%u", clip_len[1]<<4|clip_char);
 		} else {
-			int clip_char = (sam_flag&0x800) && !(opt_flag&MM_F_SOFTCLIP)? 'H' : 'S';
+			int clip_char = (((sam_flag&0x800) || ((sam_flag&0x100) && (opt_flag&MM_F_SECONDARY_SEQ))) &&
+							 !(opt_flag&MM_F_SOFTCLIP)) ? 'H' : 'S';
 			assert(clip_len[0] < qlen && clip_len[1] < qlen);
 			if (clip_len[0]) mm_sprintf_lite(s, "%d%c", clip_len[0], clip_char);
 			for (k = 0; k < r->p->n_cigar; ++k)
@@ -451,7 +454,7 @@ void mm_write_sam3(kstring_t *s, const mm_idx_t *mi, const mm_bseq1_t *t, int se
 		if (cigar_in_tag) {
 			int slen;
 			if ((flag & 0x900) == 0 || (opt_flag & MM_F_SOFTCLIP)) slen = t->l_seq;
-			else if (flag & 0x100) slen = 0;
+			else if ((flag & 0x100) && !(opt_flag & MM_F_SECONDARY_SEQ)) slen = 0;
 			else slen = r->qe - r->qs;
 			mm_sprintf_lite(s, "%dS%dN", slen, r->re - r->rs);
 		} else write_sam_cigar(s, flag, 0, t->l_seq, r, opt_flag);
@@ -492,7 +495,7 @@ void mm_write_sam3(kstring_t *s, const mm_idx_t *mi, const mm_bseq1_t *t, int se
 			mm_sprintf_lite(s, "\t");
 			if (t->qual) sam_write_sq(s, t->qual, t->l_seq, r->rev, 0);
 			else mm_sprintf_lite(s, "*");
-		} else if (flag & 0x100) {
+		} else if ((flag & 0x100) && !(opt_flag & MM_F_SECONDARY_SEQ)){
 			mm_sprintf_lite(s, "*\t*");
 		} else {
 			sam_write_sq(s, t->seq + r->qs, r->qe - r->qs, r->rev, r->rev);
diff --git a/kalloc.c b/kalloc.c
index 8499552..f5de41a 100644
--- a/kalloc.c
+++ b/kalloc.c
@@ -40,7 +40,8 @@ void *km_init2(void *km_par, size_t min_core_size)
 	kmem_t *km;
 	km = (kmem_t*)kcalloc(km_par, 1, sizeof(kmem_t));
 	km->par = km_par;
-	km->min_core_size = min_core_size > 0? min_core_size : 0x80000;
+	if (km_par) km->min_core_size = min_core_size > 0? min_core_size : ((kmem_t*)km_par)->min_core_size - 2;
+	else km->min_core_size = min_core_size > 0? min_core_size : 0x80000;
 	return (void*)km;
 }
 
@@ -183,6 +184,16 @@ void *krealloc(void *_km, void *ap, size_t n_bytes) // TODO: this can be made mo
 	return q;
 }
 
+void *krelocate(void *km, void *ap, size_t n_bytes)
+{
+	void *p;
+	if (km == 0 || ap == 0) return ap;
+	p = kmalloc(km, n_bytes);
+	memcpy(p, ap, n_bytes);
+	kfree(km, ap);
+	return p;
+}
+
 void km_stat(const void *_km, km_stat_t *s)
 {
 	kmem_t *km = (kmem_t*)_km;
@@ -203,3 +214,11 @@ void km_stat(const void *_km, km_stat_t *s)
 		s->largest = s->largest > size? s->largest : size;
 	}
 }
+
+void km_stat_print(const void *km)
+{
+	km_stat_t st;
+	km_stat(km, &st);
+	fprintf(stderr, "[km_stat] cap=%ld, avail=%ld, largest=%ld, n_core=%ld, n_block=%ld\n",
+			st.capacity, st.available, st.largest, st.n_blocks, st.n_cores);
+}
diff --git a/kalloc.h b/kalloc.h
index 93bff5e..4378672 100644
--- a/kalloc.h
+++ b/kalloc.h
@@ -13,6 +13,7 @@ typedef struct {
 
 void *kmalloc(void *km, size_t size);
 void *krealloc(void *km, void *ptr, size_t size);
+void *krelocate(void *km, void *ap, size_t n_bytes);
 void *kcalloc(void *km, size_t count, size_t size);
 void kfree(void *km, void *ptr);
 
@@ -20,11 +21,21 @@ void *km_init(void);
 void *km_init2(void *km_par, size_t min_core_size);
 void km_destroy(void *km);
 void km_stat(const void *_km, km_stat_t *s);
+void km_stat_print(const void *km);
 
 #ifdef __cplusplus
 }
 #endif
 
+#define Kmalloc(km, type, cnt)       ((type*)kmalloc((km), (cnt) * sizeof(type)))
+#define Kcalloc(km, type, cnt)       ((type*)kcalloc((km), (cnt), sizeof(type)))
+#define Krealloc(km, type, ptr, cnt) ((type*)krealloc((km), (ptr), (cnt) * sizeof(type)))
+
+#define Kexpand(km, type, a, m) do { \
+		(m) = (m) >= 4? (m) + ((m)>>1) : 16; \
+		(a) = Krealloc(km, type, (a), (m)); \
+	} while (0)
+
 #define KMALLOC(km, ptr, len) ((ptr) = (__typeof__(ptr))kmalloc((km), (len) * sizeof(*(ptr))))
 #define KCALLOC(km, ptr, len) ((ptr) = (__typeof__(ptr))kcalloc((km), (len), sizeof(*(ptr))))
 #define KREALLOC(km, ptr, len) ((ptr) = (__typeof__(ptr))krealloc((km), (ptr), (len) * sizeof(*(ptr))))
@@ -50,7 +61,7 @@ void km_stat(const void *_km, km_stat_t *s);
 	} kmp_##name##_t; \
 	SCOPE kmp_##name##_t *kmp_init_##name(void *km) { \
 		kmp_##name##_t *mp; \
-		KCALLOC(km, mp, 1); \
+		mp = Kcalloc(km, kmp_##name##_t, 1); \
 		mp->km = km; \
 		return mp; \
 	} \
@@ -66,7 +77,7 @@ void km_stat(const void *_km, km_stat_t *s);
 	} \
 	SCOPE void kmp_free_##name(kmp_##name##_t *mp, kmptype_t *p) { \
 		--mp->cnt; \
-		if (mp->n == mp->max) KEXPAND(mp->km, mp->buf, mp->max); \
+		if (mp->n == mp->max) Kexpand(mp->km, kmptype_t*, mp->buf, mp->max); \
 		mp->buf[mp->n++] = p; \
 	}
 
diff --git a/ksw2.h b/ksw2.h
index cbd1ddc..1f94c6f 100644
--- a/ksw2.h
+++ b/ksw2.h
@@ -15,6 +15,7 @@
 #define KSW_EZ_SPLICE_FOR  0x100
 #define KSW_EZ_SPLICE_REV  0x200
 #define KSW_EZ_SPLICE_FLANK 0x400
+#define KSW_EZ_SPLICE_CMPLX 0x800
 
 // The subset of CIGAR operators used by ksw code.
 // Use MM_CIGAR_* from minimap.h if you need the full list.
diff --git a/ksw2_extd2_sse.c b/ksw2_extd2_sse.c
index 162e9e2..8f96eb3 100644
--- a/ksw2_extd2_sse.c
+++ b/ksw2_extd2_sse.c
@@ -358,7 +358,7 @@ void ksw_extd2_sse(void *km, int qlen, const uint8_t *query, int tlen, const uin
 			} else H[0] = v8[0] - qe, max_H = H[0], max_t = 0; // special casing r==0
 			// update ez
 			if (en0 == tlen - 1 && H[en0] > ez->mte)
-				ez->mte = H[en0], ez->mte_q = r - en;
+				ez->mte = H[en0], ez->mte_q = r - en0;
 			if (r - st0 == qlen - 1 && H[st0] > ez->mqe)
 				ez->mqe = H[st0], ez->mqe_t = st0;
 			if (ksw_apply_zdrop(ez, 1, max_H, r, max_t, zdrop, e2)) break;
diff --git a/ksw2_exts2_sse.c b/ksw2_exts2_sse.c
index 4157e38..746778e 100644
--- a/ksw2_exts2_sse.c
+++ b/ksw2_exts2_sse.c
@@ -71,6 +71,7 @@ void ksw_exts2_sse(void *km, int qlen, const uint8_t *query, int tlen, const uin
 
 	ksw_reset_extz(ez);
 	if (m <= 1 || qlen <= 0 || tlen <= 0 || q2 <= q + e) return;
+	assert((flag & KSW_EZ_SPLICE_FOR) == 0 || (flag & KSW_EZ_SPLICE_REV) == 0); // can't be both set
 
 	zero_   = _mm_set1_epi8(0);
 	q_      = _mm_set1_epi8(q);
@@ -118,55 +119,93 @@ void ksw_exts2_sse(void *km, int qlen, const uint8_t *query, int tlen, const uin
 
 	// set the donor and acceptor arrays. TODO: this assumes 0/1/2/3 encoding!
 	if (flag & (KSW_EZ_SPLICE_FOR|KSW_EZ_SPLICE_REV)) {
-		int semi_cost = flag&KSW_EZ_SPLICE_FLANK? -noncan/2 : 0; // GTr or yAG is worth 0.5 bit; see PMID:18688272
-		memset(donor, -noncan, tlen_ * 16);
-		memset(acceptor, -noncan, tlen_ * 16);
+		const int sp0[4] = { 8, 15, 21, 30 };
+		int sp[4];
+		if (flag & KSW_EZ_SPLICE_CMPLX) {
+			for (t = 0; t < 4; ++t)
+				sp[t] = (int)((double)sp0[t] / 3. + .499);
+		} else {
+			sp[0] = flag&KSW_EZ_SPLICE_FLANK? noncan / 2 : 0;
+			sp[1] = sp[2] = sp[3] = noncan;
+		}
+		memset(donor,    -sp[3], tlen_ * 16);
+		memset(acceptor, -sp[3], tlen_ * 16);
 		if (!(flag & KSW_EZ_REV_CIGAR)) {
 			for (t = 0; t < tlen - 4; ++t) {
-				int can_type = 0; // type of canonical site: 0=none, 1=GT/AG only, 2=GTr/yAG
-				if ((flag & KSW_EZ_SPLICE_FOR) && target[t+1] == 2 && target[t+2] == 3) can_type = 1; // GTr...
-				if ((flag & KSW_EZ_SPLICE_REV) && target[t+1] == 1 && target[t+2] == 3) can_type = 1; // CTr...
-				if (can_type && (target[t+3] == 0 || target[t+3] == 2)) can_type = 2;
-				if (can_type) ((int8_t*)donor)[t] = can_type == 2? 0 : semi_cost;
+				int z = 3;
+				if (flag & KSW_EZ_SPLICE_FOR) {
+					if (target[t+1] == 2 && target[t+2] == 3)             // |GT.
+						z = target[t+3] == 0 || target[t+3] == 2? -1 : 0; // |GTr or not
+					else if (target[t+1] == 2 && target[t+2] == 1) z = 1; // |GC.
+					else if (target[t+1] == 0 && target[t+2] == 3) z = 2; // |AT.
+				} else if (flag & KSW_EZ_SPLICE_REV) {
+					if (target[t+1] == 1 && target[t+2] == 3)             // |CT. (revcomp of .AG|)
+						z = target[t+3] == 0 || target[t+3] == 2? -1 : 0;
+					else if (target[t+1] == 2 && target[t+2] == 3) z = 2; // |GT. (revcomp of .AC|)
+				}
+				((int8_t*)donor)[t] = z < 0? 0 : -sp[z];
 			}
-			if (junc)
-				for (t = 0; t < tlen - 1; ++t)
-					if (((flag & KSW_EZ_SPLICE_FOR) && (junc[t+1]&1)) || ((flag & KSW_EZ_SPLICE_REV) && (junc[t+1]&8)))
-						((int8_t*)donor)[t] += junc_bonus;
 			for (t = 2; t < tlen; ++t) {
-				int can_type = 0;
-				if ((flag & KSW_EZ_SPLICE_FOR) && target[t-1] == 0 && target[t] == 2) can_type = 1; // ...yAG
-				if ((flag & KSW_EZ_SPLICE_REV) && target[t-1] == 0 && target[t] == 1) can_type = 1; // ...yAC
-				if (can_type && (target[t-2] == 1 || target[t-2] == 3)) can_type = 2;
-				if (can_type) ((int8_t*)acceptor)[t] = can_type == 2? 0 : semi_cost;
+				int z = 3;
+				if (flag & KSW_EZ_SPLICE_FOR) {
+					if (target[t-1] == 0 && target[t] == 2)               // .AG|
+						z = target[t-2] == 1 || target[t-2] == 3? -1 : 0; // yAG| or not
+					else if (target[t-1] == 0 && target[t] == 1) z = 2;   // .AC|
+				} else if (flag & KSW_EZ_SPLICE_REV) {
+					if (target[t-1] == 0 && target[t] == 1)               // .AC| (revcomp of |GT.)
+						z = target[t-2] == 1 || target[t-2] == 3? -1 : 0; // yAC| or not
+					else if (target[t-1] == 2 && target[t] == 1) z = 1;   // .GC| (revcomp of |GC.)
+					else if (target[t-1] == 0 && target[t] == 3) z = 2;   // .AT| (revcomp of |AT.)
+				}
+				((int8_t*)acceptor)[t] = z < 0? 0 : -sp[z];
 			}
-			if (junc)
-				for (t = 0; t < tlen; ++t)
-					if (((flag & KSW_EZ_SPLICE_FOR) && (junc[t]&2)) || ((flag & KSW_EZ_SPLICE_REV) && (junc[t]&4)))
-						((int8_t*)acceptor)[t] += junc_bonus;
 		} else {
 			for (t = 0; t < tlen - 4; ++t) {
-				int can_type = 0; // type of canonical site: 0=none, 1=GT/AG only, 2=GTr/yAG
-				if ((flag & KSW_EZ_SPLICE_FOR) && target[t+1] == 2 && target[t+2] == 0) can_type = 1; // GAy...
-				if ((flag & KSW_EZ_SPLICE_REV) && target[t+1] == 1 && target[t+2] == 0) can_type = 1; // CAy...
-				if (can_type && (target[t+3] == 1 || target[t+3] == 3)) can_type = 2;
-				if (can_type) ((int8_t*)donor)[t] = can_type == 2? 0 : semi_cost;
+				int z = 3;
+				if (flag & KSW_EZ_SPLICE_FOR) {
+					if (target[t+1] == 2 && target[t+2] == 0)             // |GA. (rev of .AG|)
+						z = target[t+3] == 1 || target[t+3] == 3? -1 : 0;
+					else if (target[t+1] == 1 && target[t+2] == 0) z = 2; // |CA. (rev of .AC|)
+				} else if (flag & KSW_EZ_SPLICE_REV) {
+					if (target[t+1] == 1 && target[t+2] == 0)             // |CA. (comp of |GT.)
+						z = target[t+3] == 1 || target[t+3] == 3? -1 : 0;
+					else if (target[t+1] == 1 && target[t+2] == 2) z = 1; // |CG. (comp of |GC.)
+					else if (target[t+1] == 3 && target[t+2] == 0) z = 2; // |TA. (comp of |AT.)
+				}
+				((int8_t*)donor)[t] = z < 0? 0 : -sp[z];
 			}
-			if (junc)
-				for (t = 0; t < tlen - 1; ++t)
-					if (((flag & KSW_EZ_SPLICE_FOR) && (junc[t+1]&2)) || ((flag & KSW_EZ_SPLICE_REV) && (junc[t+1]&4)))
-						((int8_t*)donor)[t] += junc_bonus;
 			for (t = 2; t < tlen; ++t) {
-				int can_type = 0;
-				if ((flag & KSW_EZ_SPLICE_FOR) && target[t-1] == 3 && target[t] == 2) can_type = 1; // ...rTG
-				if ((flag & KSW_EZ_SPLICE_REV) && target[t-1] == 3 && target[t] == 1) can_type = 1; // ...rTC
-				if (can_type && (target[t-2] == 0 || target[t-2] == 2)) can_type = 2;
-				if (can_type) ((int8_t*)acceptor)[t] = can_type == 2? 0 : semi_cost;
+				int z = 3;
+				if (flag & KSW_EZ_SPLICE_FOR) {
+					if (target[t-1] == 3 && target[t] == 2)               // .TG| (rev of |GT.)
+						z = target[t-2] == 0 || target[t-2] == 2? -1 : 0;
+					else if (target[t-1] == 1 && target[t] == 2) z = 1;   // .CG| (rev of |GC.)
+					else if (target[t-1] == 3 && target[t] == 0) z = 2;   // .TA| (rev of |AT.)
+				} else if (flag & KSW_EZ_SPLICE_REV) {
+					if (target[t-1] == 3 && target[t] == 1)               // .TC| (comp of .AG|)
+						z = target[t-2] == 0 || target[t-2] == 2? -1 : 0;
+					else if (target[t-1] == 3 && target[t] == 2) z = 2;   // .TG| (comp of .AC|)
+				}
+				((int8_t*)acceptor)[t] = z < 0? 0 : -sp[z];
 			}
-			if (junc)
-				for (t = 0; t < tlen; ++t)
-					if (((flag & KSW_EZ_SPLICE_FOR) && (junc[t]&1)) || ((flag & KSW_EZ_SPLICE_REV) && (junc[t]&8)))
-						((int8_t*)acceptor)[t] += junc_bonus;
+		}
+	}
+
+	if (junc) {
+		if (!(flag & KSW_EZ_REV_CIGAR)) {
+			for (t = 0; t < tlen - 1; ++t)
+				if (((flag & KSW_EZ_SPLICE_FOR) && (junc[t+1]&1)) || ((flag & KSW_EZ_SPLICE_REV) && (junc[t+1]&8)))
+					((int8_t*)donor)[t] += junc_bonus;
+			for (t = 0; t < tlen; ++t)
+				if (((flag & KSW_EZ_SPLICE_FOR) && (junc[t]&2)) || ((flag & KSW_EZ_SPLICE_REV) && (junc[t]&4)))
+					((int8_t*)acceptor)[t] += junc_bonus;
+		} else {
+			for (t = 0; t < tlen - 1; ++t)
+				if (((flag & KSW_EZ_SPLICE_FOR) && (junc[t+1]&2)) || ((flag & KSW_EZ_SPLICE_REV) && (junc[t+1]&4)))
+					((int8_t*)donor)[t] += junc_bonus;
+			for (t = 0; t < tlen; ++t)
+				if (((flag & KSW_EZ_SPLICE_FOR) && (junc[t]&1)) || ((flag & KSW_EZ_SPLICE_REV) && (junc[t]&8)))
+					((int8_t*)acceptor)[t] += junc_bonus;
 		}
 	}
 
@@ -376,7 +415,7 @@ void ksw_exts2_sse(void *km, int qlen, const uint8_t *query, int tlen, const uin
 			} else H[0] = v8[0] - qe, max_H = H[0], max_t = 0; // special casing r==0
 			// update ez
 			if (en0 == tlen - 1 && H[en0] > ez->mte)
-				ez->mte = H[en0], ez->mte_q = r - en;
+				ez->mte = H[en0], ez->mte_q = r - en0;
 			if (r - st0 == qlen - 1 && H[st0] > ez->mqe)
 				ez->mqe = H[st0], ez->mqe_t = st0;
 			if (ksw_apply_zdrop(ez, 1, max_H, r, max_t, zdrop, 0)) break;
diff --git a/ksw2_extz2_sse.c b/ksw2_extz2_sse.c
index ad19131..a2154fe 100644
--- a/ksw2_extz2_sse.c
+++ b/ksw2_extz2_sse.c
@@ -269,7 +269,7 @@ void ksw_extz2_sse(void *km, int qlen, const uint8_t *query, int tlen, const uin
 			} else H[0] = v8[0] - qe - qe, max_H = H[0], max_t = 0; // special casing r==0
 			// update ez
 			if (en0 == tlen - 1 && H[en0] > ez->mte)
-				ez->mte = H[en0], ez->mte_q = r - en;
+				ez->mte = H[en0], ez->mte_q = r - en0;
 			if (r - st0 == qlen - 1 && H[st0] > ez->mqe)
 				ez->mqe = H[st0], ez->mqe_t = st0;
 			if (ksw_apply_zdrop(ez, 1, max_H, r, max_t, zdrop, e)) break;
diff --git a/lchain.c b/lchain.c
index d004157..7df5cab 100644
--- a/lchain.c
+++ b/lchain.c
@@ -35,7 +35,7 @@ uint64_t *mg_chain_backtrack(void *km, int64_t n, const int32_t *f, const int64_
 	for (i = 0, n_z = 0; i < n; ++i) // precompute n_z
 		if (f[i] >= min_sc) ++n_z;
 	if (n_z == 0) return 0;
-	KMALLOC(km, z, n_z);
+	z = Kmalloc(km, mm128_t, n_z);
 	for (i = 0, k = 0; i < n; ++i) // populate z[]
 		if (f[i] >= min_sc) z[k].x = f[i], z[k++].y = i;
 	radix_sort_128x(z, z + n_z);
@@ -54,7 +54,7 @@ uint64_t *mg_chain_backtrack(void *km, int64_t n, const int32_t *f, const int64_
 			else n_v = n_v0;
 		}
 	}
-	KMALLOC(km, u, n_u);
+	u = Kmalloc(km, uint64_t, n_u);
 	memset(t, 0, n * 4);
 	for (k = n_z - 1, n_v = n_u = 0; k >= 0; --k) { // populate u[]
 		if (t[z[k].y] == 0) {
@@ -82,7 +82,7 @@ static mm128_t *compact_a(void *km, int32_t n_u, uint64_t *u, int32_t n_v, int32
 	int64_t i, j, k;
 
 	// write the result to b[]
-	KMALLOC(km, b, n_v);
+	b = Kmalloc(km, mm128_t, n_v);
 	for (i = 0, k = 0; i < n_u; ++i) {
 		int32_t k0 = k, ni = (int32_t)u[i];
 		for (j = 0; j < ni; ++j)
@@ -91,13 +91,13 @@ static mm128_t *compact_a(void *km, int32_t n_u, uint64_t *u, int32_t n_v, int32
 	kfree(km, v);
 
 	// sort u[] and a[] by the target position, such that adjacent chains may be joined
-	KMALLOC(km, w, n_u);
+	w = Kmalloc(km, mm128_t, n_u);
 	for (i = k = 0; i < n_u; ++i) {
 		w[i].x = b[k].x, w[i].y = (uint64_t)k<<32|i;
 		k += (int32_t)u[i];
 	}
 	radix_sort_128x(w, w + n_u);
-	KMALLOC(km, u2, n_u);
+	u2 = Kmalloc(km, uint64_t, n_u);
 	for (i = k = 0; i < n_u; ++i) {
 		int32_t j = (int32_t)w[i].y, n = (int32_t)u[j];
 		u2[i] = u[j];
@@ -138,7 +138,7 @@ static inline int32_t comput_sc(const mm128_t *ai, const mm128_t *aj, int32_t ma
 }
 
 /* Input:
- *   a[].x: tid<<33 | rev<<32 | tpos
+ *   a[].x: rev<<63 | tid<<32 | tpos
  *   a[].y: flags<<40 | q_span<<32 | q_pos
  * Output:
  *   n_u: #chains
@@ -160,10 +160,10 @@ mm128_t *mg_lchain_dp(int max_dist_x, int max_dist_y, int bw, int max_skip, int
 	if (max_dist_x < bw) max_dist_x = bw;
 	if (max_dist_y < bw && !is_cdna) max_dist_y = bw;
 	if (is_cdna) max_drop = INT32_MAX;
-	KMALLOC(km, p, n);
-	KMALLOC(km, f, n);
-	KMALLOC(km, v, n);
-	KCALLOC(km, t, n);
+	p = Kmalloc(km, int64_t, n);
+	f = Kmalloc(km, int32_t, n);
+	v = Kmalloc(km, int32_t, n);
+	t = Kcalloc(km, int32_t, n);
 
 	// fill the score and backtrack arrays
 	for (i = 0, max_ii = -1; i < n; ++i) {
@@ -251,7 +251,7 @@ mm128_t *mg_lchain_rmq(int max_dist, int max_dist_inner, int bw, int max_chn_ski
 					   int64_t n, mm128_t *a, int *n_u_, uint64_t **_u, void *km)
 {
 	int32_t *f,*t, *v, n_u, n_v, mmax_f = 0, max_rmq_size = 0, max_drop = bw;
-	int64_t *p, i, i0, st = 0, st_inner = 0, n_iter = 0;
+	int64_t *p, i, i0, st = 0, st_inner = 0;
 	uint64_t *u;
 	lc_elem_t *root = 0, *root_inner = 0;
 	void *mem_mp = 0;
@@ -264,10 +264,10 @@ mm128_t *mg_lchain_rmq(int max_dist, int max_dist_inner, int bw, int max_chn_ski
 	}
 	if (max_dist < bw) max_dist = bw;
 	if (max_dist_inner <= 0 || max_dist_inner >= max_dist) max_dist_inner = 0;
-	KMALLOC(km, p, n);
-	KMALLOC(km, f, n);
-	KCALLOC(km, t, n);
-	KMALLOC(km, v, n);
+	p = Kmalloc(km, int64_t, n);
+	f = Kmalloc(km, int32_t, n);
+	t = Kcalloc(km, int32_t, n);
+	v = Kmalloc(km, int32_t, n);
 	mem_mp = km_init2(km, 0x10000);
 	mp = kmp_init_rmq(mem_mp);
 
@@ -345,7 +345,6 @@ mm128_t *mg_lchain_rmq(int max_dist, int max_dist_inner, int bw, int max_chn_ski
 						}
 						if (!krmq_itr_prev(lc_elem, &itr)) break;
 					}
-					n_iter += n_rmq_iter;
 				}
 			}
 		}
diff --git a/main.c b/main.c
index 0be9933..7db642d 100644
--- a/main.c
+++ b/main.c
@@ -7,8 +7,6 @@
 #include "mmpriv.h"
 #include "ketopt.h"
 
-#define MM_VERSION "2.24-r1122"
-
 #ifdef __linux__
 #include <sys/resource.h>
 #include <sys/time.h>
@@ -78,6 +76,7 @@ static ko_longopt_t long_options[] = {
 	{ "chain-skip-scale",ko_required_argument,351 },
 	{ "print-chains",   ko_no_argument,       352 },
 	{ "no-hash-name",   ko_no_argument,       353 },
+	{ "secondary-seq",  ko_no_argument,       354 },
 	{ "help",           ko_no_argument,       'h' },
 	{ "max-intron-len", ko_required_argument, 'G' },
 	{ "version",        ko_no_argument,       'V' },
@@ -121,7 +120,7 @@ static inline void yes_or_no(mm_mapopt_t *opt, int64_t flag, int long_idx, const
 
 int main(int argc, char *argv[])
 {
-	const char *opt_str = "2aSDw:k:K:t:r:f:Vv:g:G:I:d:XT:s:x:Hcp:M:n:z:A:B:O:E:m:N:Qu:R:hF:LC:yYPo:e:U:";
+	const char *opt_str = "2aSDw:k:K:t:r:f:Vv:g:G:I:d:XT:s:x:Hcp:M:n:z:A:B:O:E:m:N:Qu:R:hF:LC:yYPo:e:U:J:";
 	ketopt_t o = KETOPT_INIT;
 	mm_mapopt_t opt;
 	mm_idxopt_t ipt;
@@ -187,7 +186,12 @@ int main(int argc, char *argv[])
 		else if (c == 'R') rg = o.arg;
 		else if (c == 'h') fp_help = stdout;
 		else if (c == '2') opt.flag |= MM_F_2_IO_THREADS;
-		else if (c == 'o') {
+		else if (c == 'J') {
+			int t;
+			t = atoi(o.arg);
+			if (t == 0) opt.flag |= MM_F_SPLICE_OLD;
+			else if (t == 1) opt.flag &= ~MM_F_SPLICE_OLD;
+		} else if (c == 'o') {
 			if (strcmp(o.arg, "-") != 0) {
 				if (freopen(o.arg, "wb", stdout) == NULL) {
 					fprintf(stderr, "[ERROR]\033[1;31m failed to write the output to file '%s'\033[0m: %s\n", o.arg, strerror(errno));
@@ -237,6 +241,7 @@ int main(int argc, char *argv[])
 		else if (c == 350) opt.q_occ_frac = atof(o.arg); // --q-occ-frac
 		else if (c == 352) mm_dbg_flag |= MM_DBG_PRINT_CHAIN; // --print-chains
 		else if (c == 353) opt.flag |= MM_F_NO_HASH_NAME; // --no-hash-name
+		else if (c == 354) opt.flag |= MM_F_SECONDARY_SEQ; // --secondary-seq
 		else if (c == 330) {
 			fprintf(stderr, "[WARNING] \033[1;31m --lj-min-ratio has been deprecated.\033[0m\n");
 		} else if (c == 314) { // --frag
@@ -261,7 +266,8 @@ int main(int argc, char *argv[])
 		} else if (c == 326) { // --dual
 			yes_or_no(&opt, MM_F_NO_DUAL, o.longidx, o.arg, 0);
 		} else if (c == 347) { // --rmq
-			yes_or_no(&opt, MM_F_RMQ, o.longidx, o.arg, 1);
+			if (o.arg) yes_or_no(&opt, MM_F_RMQ, o.longidx, o.arg, 1);
+			else opt.flag |= MM_F_RMQ;
 		} else if (c == 'S') {
 			opt.flag |= MM_F_OUT_CS | MM_F_CIGAR | MM_F_OUT_CS_LONG;
 			if (mm_verbose >= 2)
@@ -322,7 +328,7 @@ int main(int argc, char *argv[])
 		fprintf(fp_help, "    -H           use homopolymer-compressed k-mer (preferrable for PacBio)\n");
 		fprintf(fp_help, "    -k INT       k-mer size (no larger than 28) [%d]\n", ipt.k);
 		fprintf(fp_help, "    -w INT       minimizer window size [%d]\n", ipt.w);
-		fprintf(fp_help, "    -I NUM       split index for every ~NUM input bases [4G]\n");
+		fprintf(fp_help, "    -I NUM       split index for every ~NUM input bases [8G]\n");
 		fprintf(fp_help, "    -d FILE      dump index to FILE []\n");
 		fprintf(fp_help, "  Mapping:\n");
 		fprintf(fp_help, "    -f FLOAT     filter out top FLOAT fraction of repetitive minimizers [%g]\n", opt.mid_occ_frac);
@@ -344,6 +350,7 @@ int main(int argc, char *argv[])
 		fprintf(fp_help, "    -z INT[,INT] Z-drop score and inversion Z-drop score [%d,%d]\n", opt.zdrop, opt.zdrop_inv);
 		fprintf(fp_help, "    -s INT       minimal peak DP alignment score [%d]\n", opt.min_dp_max);
 		fprintf(fp_help, "    -u CHAR      how to find GT-AG. f:transcript strand, b:both strands, n:don't match GT-AG [n]\n");
+		fprintf(fp_help, "    -J INT       splice mode. 0: original minimap2 model; 1: miniprot model [1]\n");
 		fprintf(fp_help, "  Input/Output:\n");
 		fprintf(fp_help, "    -a           output in the SAM format (PAF by default)\n");
 		fprintf(fp_help, "    -o FILE      output alignments to FILE [stdout]\n");
diff --git a/map.c b/map.c
index 5311468..038888f 100644
--- a/map.c
+++ b/map.c
@@ -10,11 +10,6 @@
 #include "bseq.h"
 #include "khash.h"
 
-struct mm_tbuf_s {
-	void *km;
-	int rep_len, frag_gap;
-};
-
 mm_tbuf_t *mm_tbuf_init(void)
 {
 	mm_tbuf_t *b;
diff --git a/minimap.h b/minimap.h
index 13e12e0..504bdc8 100644
--- a/minimap.h
+++ b/minimap.h
@@ -5,41 +5,45 @@
 #include <stdio.h>
 #include <sys/types.h>
 
-#define MM_F_NO_DIAG       0x001 // no exact diagonal hit
-#define MM_F_NO_DUAL       0x002 // skip pairs where query name is lexicographically larger than target name
-#define MM_F_CIGAR         0x004
-#define MM_F_OUT_SAM       0x008
-#define MM_F_NO_QUAL       0x010
-#define MM_F_OUT_CG        0x020
-#define MM_F_OUT_CS        0x040
-#define MM_F_SPLICE        0x080 // splice mode
-#define MM_F_SPLICE_FOR    0x100 // match GT-AG
-#define MM_F_SPLICE_REV    0x200 // match CT-AC, the reverse complement of GT-AG
-#define MM_F_NO_LJOIN      0x400
-#define MM_F_OUT_CS_LONG   0x800
-#define MM_F_SR            0x1000
-#define MM_F_FRAG_MODE     0x2000
-#define MM_F_NO_PRINT_2ND  0x4000
-#define MM_F_2_IO_THREADS  0x8000
-#define MM_F_LONG_CIGAR    0x10000
-#define MM_F_INDEPEND_SEG  0x20000
-#define MM_F_SPLICE_FLANK  0x40000
-#define MM_F_SOFTCLIP      0x80000
-#define MM_F_FOR_ONLY      0x100000
-#define MM_F_REV_ONLY      0x200000
-#define MM_F_HEAP_SORT     0x400000
-#define MM_F_ALL_CHAINS    0x800000
-#define MM_F_OUT_MD        0x1000000
-#define MM_F_COPY_COMMENT  0x2000000
-#define MM_F_EQX           0x4000000 // use =/X instead of M
-#define MM_F_PAF_NO_HIT    0x8000000 // output unmapped reads to PAF
-#define MM_F_NO_END_FLT    0x10000000
-#define MM_F_HARD_MLEVEL   0x20000000
-#define MM_F_SAM_HIT_ONLY  0x40000000
+#define MM_VERSION "2.26-r1175"
+
+#define MM_F_NO_DIAG       (0x001LL) // no exact diagonal hit
+#define MM_F_NO_DUAL       (0x002LL) // skip pairs where query name is lexicographically larger than target name
+#define MM_F_CIGAR         (0x004LL)
+#define MM_F_OUT_SAM       (0x008LL)
+#define MM_F_NO_QUAL       (0x010LL)
+#define MM_F_OUT_CG        (0x020LL)
+#define MM_F_OUT_CS        (0x040LL)
+#define MM_F_SPLICE        (0x080LL) // splice mode
+#define MM_F_SPLICE_FOR    (0x100LL) // match GT-AG
+#define MM_F_SPLICE_REV    (0x200LL) // match CT-AC, the reverse complement of GT-AG
+#define MM_F_NO_LJOIN      (0x400LL)
+#define MM_F_OUT_CS_LONG   (0x800LL)
+#define MM_F_SR            (0x1000LL)
+#define MM_F_FRAG_MODE     (0x2000LL)
+#define MM_F_NO_PRINT_2ND  (0x4000LL)
+#define MM_F_2_IO_THREADS  (0x8000LL)
+#define MM_F_LONG_CIGAR    (0x10000LL)
+#define MM_F_INDEPEND_SEG  (0x20000LL)
+#define MM_F_SPLICE_FLANK  (0x40000LL)
+#define MM_F_SOFTCLIP      (0x80000LL)
+#define MM_F_FOR_ONLY      (0x100000LL)
+#define MM_F_REV_ONLY      (0x200000LL)
+#define MM_F_HEAP_SORT     (0x400000LL)
+#define MM_F_ALL_CHAINS    (0x800000LL)
+#define MM_F_OUT_MD        (0x1000000LL)
+#define MM_F_COPY_COMMENT  (0x2000000LL)
+#define MM_F_EQX           (0x4000000LL) // use =/X instead of M
+#define MM_F_PAF_NO_HIT    (0x8000000LL) // output unmapped reads to PAF
+#define MM_F_NO_END_FLT    (0x10000000LL)
+#define MM_F_HARD_MLEVEL   (0x20000000LL)
+#define MM_F_SAM_HIT_ONLY  (0x40000000LL)
 #define MM_F_RMQ           (0x80000000LL)
 #define MM_F_QSTRAND       (0x100000000LL)
 #define MM_F_NO_INV        (0x200000000LL)
 #define MM_F_NO_HASH_NAME  (0x400000000LL)
+#define MM_F_SPLICE_OLD    (0x800000000LL)
+#define MM_F_SECONDARY_SEQ (0x1000000000LL)	//output SEQ field for seqondary alignments using hard clipping
 
 #define MM_I_HPC          0x1
 #define MM_I_NO_SEQ       0x2
@@ -189,6 +193,11 @@ typedef struct {
 } mm_idx_reader_t;
 
 // memory buffer for thread-local storage during mapping
+struct mm_tbuf_s {
+	void *km;
+	int rep_len, frag_gap;
+};
+
 typedef struct mm_tbuf_s mm_tbuf_t;
 
 // global variables
diff --git a/minimap2.1 b/minimap2.1
index a91c9d0..9e33971 100644
--- a/minimap2.1
+++ b/minimap2.1
@@ -1,4 +1,4 @@
-.TH minimap2 1 "18 December 2021" "minimap2-2.24 (r1122)" "Bioinformatics tools"
+.TH minimap2 1 "29 April 2023" "minimap2-2.26 (r1175)" "Bioinformatics tools"
 .SH NAME
 .PP
 minimap2 - mapping and alignment between collections of DNA sequences
@@ -79,6 +79,19 @@ Minimizer k-mer length [15]
 .BI -w \ INT
 Minimizer window size [10]. A minimizer is the smallest k-mer
 in a window of w consecutive k-mers.
+.TP
+.BI -j \ INT
+Syncmer submer size [10]. Option
+.B -j
+and
+.B -w
+will override each: if
+.B -w
+is applied after
+.BR -j ,
+.B -j
+will have no effect, and vice versa.
+
 .TP
 .B -H
 Use homopolymer-compressed (HPC) minimizers. An HPC sequence is constructed by
@@ -88,16 +101,17 @@ on the HPC sequence.
 .BI -I \ NUM
 Load at most
 .I NUM
-target bases into RAM for indexing [4G]. If there are more than
+target bases into RAM for indexing [8G]. If there are more than
 .I NUM
 bases in
 .IR target.fa ,
 minimap2 needs to read
 .I query.fa
-multiple times to map it against each batch of target sequences.
+multiple times to map it against each batch of target sequences. This would create a multi-part index.
 .I NUM
 may be ending with k/K/m/M/g/G. NB: mapping quality is incorrect given a
-multi-part index.
+multi-part index. See also option
+.BR --split-prefix .
 .TP
 .B --idx-no-seq
 Don't store target sequences in the index. It saves disk space and memory but
@@ -587,7 +601,7 @@ Up to 20% sequence divergence.
 .B splice
 Long-read spliced alignment
 .RB ( -k15
-.B -w5 --splice -g2k -G200k -A1 -B2 -O2,32 -E1,0 -b0 -C9 -z200 -ub --junc-bonus=9 --cap-sw-mem=0
+.B -w5 --splice -g2k -G200k -A1 -B2 -O2,32 -E1,0 -C9 -z200 -ub --junc-bonus=9 --cap-sw-mem=0
 .BR --splice-flank=yes ).
 In the splice mode, 1) long deletions are taken as introns and represented as
 the
diff --git a/misc/paftools.js b/misc/paftools.js
index 35b9acb..a02e332 100755
--- a/misc/paftools.js
+++ b/misc/paftools.js
@@ -1,6 +1,6 @@
 #!/usr/bin/env k8
 
-var paftools_version = '2.24-r1122';
+var paftools_version = '2.26-r1175';
 
 /*****************************
  ***** Library functions *****
@@ -1532,22 +1532,24 @@ function paf_view(args)
 
 function paf_gff2bed(args)
 {
-	var c, fn_ucsc_fai = null, is_short = false, keep_gff = false, print_junc = false, output_gene = false;
-	while ((c = getopt(args, "u:sgjG")) != null) {
+	var c, fn_ucsc_fai = null, is_short = false, keep_gff = false, print_junc = false, output_gene = false, ens_canon_only = false;
+	while ((c = getopt(args, "u:sgjGe")) != null) {
 		if (c == 'u') fn_ucsc_fai = getopt.arg;
 		else if (c == 's') is_short = true;
 		else if (c == 'g') keep_gff = true;
 		else if (c == 'j') print_junc = true;
 		else if (c == 'G') output_gene = true;
+		else if (c == 'e') ens_canon_only = true;
 	}
 
 	if (getopt.ind == args.length) {
 		print("Usage: paftools.js gff2bed [options] <in.gff>");
 		print("Options:");
-		print("  -j       Output junction BED");
-		print("  -s       Print names in the short form");
+		print("  -j       output junction BED");
+		print("  -s       print names in the short form");
 		print("  -u FILE  hg38.fa.fai for chr name conversion");
-		print("  -g       Output GFF (used with -u)");
+		print("  -e       only show transcript tagged with 'Ensembl_canonical'");
+		print("  -g       output GFF (used with -u)");
 		exit(1);
 	}
 
@@ -1606,7 +1608,7 @@ function paf_gff2bed(args)
 		print(a[0][0], st, en, name, 1000, a[0][3], cds_st, cds_en, color, a.length, sizes.join(",") + ",", starts.join(",") + ",");
 	}
 
-	var re_gtf  = /\b(transcript_id|transcript_type|transcript_biotype|gene_name|gene_id|gbkey|transcript_name) "([^"]+)";/g;
+	var re_gtf  = /\b(transcript_id|transcript_type|transcript_biotype|gene_name|gene_id|gbkey|transcript_name|tag) "([^"]+)";/g;
 	var re_gff3 = /\b(transcript_id|transcript_type|transcript_biotype|gene_name|gene_id|gbkey|transcript_name)=([^;]+)/g;
 	var re_gtf_gene  = /\b(gene_id|gene_type|gene_name) "([^;]+)";/g;
 	var re_gff3_gene = /\b(gene_id|gene_type|source_gene|gene_biotype|gene_name)=([^;]+);/g;
@@ -1646,13 +1648,14 @@ function paf_gff2bed(args)
 		if (t[2] != "CDS" && t[2] != "exon") continue;
 		t[3] = parseInt(t[3]) - 1;
 		t[4] = parseInt(t[4]);
-		var id = null, type = "", name = "N/A", biotype = "", m, tname = "N/A";
+		var id = null, type = "", name = "N/A", biotype = "", m, tname = "N/A", ens_canonical = false;
 		while ((m = re_gtf.exec(t[8])) != null) {
 			if (m[1] == "transcript_id") id = m[2];
 			else if (m[1] == "transcript_type") type = m[2];
 			else if (m[1] == "transcript_biotype" || m[1] == "gbkey") biotype = m[2];
 			else if (m[1] == "gene_name" || m[1] == "gene_id") name = m[2];
 			else if (m[1] == "transcript_name") tname = m[2];
+			else if (m[1] == "tag" && m[2] == "Ensembl_canonical") ens_canonical = true;
 		}
 		while ((m = re_gff3.exec(t[8])) != null) {
 			if (m[1] == "transcript_id") id = m[2];
@@ -1661,6 +1664,7 @@ function paf_gff2bed(args)
 			else if (m[1] == "gene_name" || m[1] == "gene_id") name = m[2];
 			else if (m[1] == "transcript_name") tname = m[2];
 		}
+		if (ens_canon_only && !ens_canonical) continue;
 		if (type == "" && biotype != "") type = biotype;
 		if (id == null) throw Error("No transcript_id");
 		if (id != last_id) {
@@ -2341,12 +2345,15 @@ function paf_pbsim2fq(args)
 
 function paf_junceval(args)
 {
-	var c, l_fuzzy = 0, print_ovlp = false, print_err_only = false, first_only = false, chr_only = false;
-	while ((c = getopt(args, "l:epc")) != null) {
+	var c, l_fuzzy = 0, print_ovlp = false, print_err_only = false, first_only = false, chr_only = false, aa = false, is_bed = false;
+	while ((c = getopt(args, "l:epcab1")) != null) {
 		if (c == 'l') l_fuzzy = parseInt(getopt.arg);
 		else if (c == 'e') print_err_only = print_ovlp = true;
 		else if (c == 'p') print_ovlp = true;
 		else if (c == 'c') chr_only = true;
+		else if (c == 'a') aa = true;
+		else if (c == 'b') is_bed = true;
+		else if (c == '1') first_only = true;
 	}
 
 	if (args.length - getopt.ind < 1) {
@@ -2356,6 +2363,9 @@ function paf_junceval(args)
 		print("  -p        print overlapping introns");
 		print("  -e        print erroreous overlapping introns");
 		print("  -c        only consider alignments to /^(chr)?([0-9]+|X|Y)$/");
+		print("  -a        miniprot PAF as input");
+		print("  -b        BED as input");
+		print("  -1        only process the first alignment of each query");
 		exit(1);
 	}
 
@@ -2409,13 +2419,17 @@ function paf_junceval(args)
 
 	file = getopt.ind+1 >= args.length || args[getopt.ind+1] == '-'? new File() : new File(args[getopt.ind+1]);
 	var last_qname = null;
-	var re_cigar = /(\d+)([MIDNSHP=X])/g;
+	var re_cigar = /(\d+)([MIDNSHP=XFGUV])/g;
 	while (file.readline(buf) >= 0) {
 		var m, t = buf.toString().split("\t");
-		var ctg_name = null, cigar = null, pos = null, qname = t[0];
+		var ctg_name = null, cigar = null, pos = null, qname;
 
 		if (t[0].charAt(0) == '@') continue;
-		if (t[4] == '+' || t[4] == '-' || t[4] == '*') { // PAF
+		if (t[0] == "##PAF") t.shift();
+		qname = t[0];
+		if (is_bed) {
+			ctg_name = t[0], pos = parseInt(t[1]), cigar == null;
+		} else if (t[4] == '+' || t[4] == '-' || t[4] == '*') { // PAF
 			ctg_name = t[5], pos = parseInt(t[7]);
 			var type = 'P';
 			for (i = 12; i < t.length; ++i) {
@@ -2445,12 +2459,43 @@ function paf_junceval(args)
 		}
 
 		var intron = [];
-		while ((m = re_cigar.exec(cigar)) != null) {
-			var len = parseInt(m[1]), op = m[2];
-			if (op == 'N') {
-				intron.push([pos, pos + len]);
-				pos += len;
-			} else if (op == 'M' || op == 'X' || op == '=' || op == 'D') pos += len;
+		if (is_bed) {
+			intron.push([pos, parseInt(t[2])]);
+		} else if (aa) {
+			var tmp_junc = [], tmp = 0;
+			while ((m = re_cigar.exec(cigar)) != null) {
+				var len = parseInt(m[1]), op = m[2];
+				if (op == 'N') {
+					tmp_junc.push([tmp, tmp + len]);
+					tmp += len;
+				} else if (op == 'U') {
+					tmp_junc.push([tmp + 1, tmp + len - 2]);
+					tmp += len;
+				} else if (op == 'V') {
+					tmp_junc.push([tmp + 2, tmp + len - 1]);
+					tmp += len;
+				} else if (op == 'M' || op == 'X' || op == '=' || op == 'D') {
+					tmp += len * 3;
+				} else if (op == 'F' || op == 'G') {
+					tmp += len;
+				}
+			}
+			if (t[4] == '+') {
+				for (var i = 0; i < tmp_junc.length; ++i)
+					intron.push([pos + tmp_junc[i][0], pos + tmp_junc[i][1]]);
+			} else if (t[4] == '-') {
+				var glen = parseInt(t[8]) - parseInt(t[7]);
+				for (var i = tmp_junc.length - 1; i >= 0; --i)
+					intron.push([pos + (glen - tmp_junc[i][1]), pos + (glen - tmp_junc[i][0])]);
+			}
+		} else {
+			while ((m = re_cigar.exec(cigar)) != null) {
+				var len = parseInt(m[1]), op = m[2];
+				if (op == 'N') {
+					intron.push([pos, pos + len]);
+					pos += len;
+				} else if (op == 'M' || op == 'X' || op == '=' || op == 'D') pos += len;
+			}
 		}
 		if (intron.length == 0) {
 			++n_sgl;
@@ -2509,6 +2554,276 @@ function paf_junceval(args)
 	}
 }
 
+function paf_exoneval(args) // adapted from paf_junceval()
+{
+	var c, l_fuzzy = 0, print_ovlp = false, print_err_only = false, first_only = false, chr_only = false, aa = false, is_bed = false, use_cds = false, eval_base = false;
+	while ((c = getopt(args, "l:epcab1ds")) != null) {
+		if (c == 'l') l_fuzzy = parseInt(getopt.arg);
+		else if (c == 'e') print_err_only = print_ovlp = true;
+		else if (c == 'p') print_ovlp = true;
+		else if (c == 'c') chr_only = true;
+		else if (c == 'a') aa = true, use_cds = true;
+		else if (c == 'b') is_bed = true;
+		else if (c == '1') first_only = true;
+		else if (c == 'd') use_cds = true;
+		else if (c == 's') eval_base = true;
+	}
+
+	if (args.length - getopt.ind < 1) {
+		print("Usage: paftools.js exoneval [options] <gene.gtf> <aln.sam>");
+		print("Options:");
+		print("  -l INT    tolerance of junction positions (0 for exact) [0]");
+		print("  -d        evaluate coding regions only (exon regions by default)");
+		print("  -a        miniprot PAF as input (force -d)");
+		print("  -p        print overlapping exons");
+		print("  -e        print erroreous overlapping exons");
+		print("  -c        only consider alignments to /^(chr)?([0-9]+|X|Y)$/");
+		print("  -1        only process the first alignment of each query");
+		print("  -b        BED as input");
+		print("  -s        compute base Sn and Sp (more memory)");
+		exit(1);
+	}
+
+	var file, buf = new Bytes();
+
+	warn("Reading reference GTF...");
+	var tr = {};
+	file = args[getopt.ind] == '-'? new File() : new File(args[getopt.ind]);
+	while (file.readline(buf) >= 0) {
+		var m, t = buf.toString().split("\t");
+		if (t[0].charAt(0) == '#') continue;
+		if (use_cds) {
+			if (t[2] != "cds" && t[2] != "CDS") continue;
+		} else {
+			if (t[2] != 'exon') continue;
+		}
+		var st = parseInt(t[3]) - 1;
+		var en = parseInt(t[4]);
+		if ((m = /transcript_id "(\S+)"/.exec(t[8])) == null) continue;
+		var tid = m[1];
+		if (tr[tid] == null) tr[tid] = [t[0], t[6], 0, 0, []];
+		tr[tid][4].push([st, en]); // this keeps transcript
+	}
+	file.close();
+
+	var anno = {};
+	for (var tid in tr) { // traverse each transcript
+		var t = tr[tid];
+		Interval.sort(t[4]);
+		t[2] = t[4][0][0];
+		t[3] = t[4][t[4].length - 1][1];
+		if (anno[t[0]] == null) anno[t[0]] = [];
+		var s = t[4];
+		for (var i = 0; i < s.length; ++i) // traverse each exon
+			anno[t[0]].push([s[i][0], s[i][1]]);
+	}
+	tr = null;
+
+	for (var chr in anno) { // index exons
+		var e = anno[chr];
+		if (e.length == 0) continue;
+		Interval.sort(e);
+		var k = 0;
+		for (var i = 1; i < e.length; ++i) // dedup
+			if (e[i][0] != e[k][0] || e[i][1] != e[k][1])
+				e[++k] = e[i].slice(0);
+		e.length = k + 1;
+		Interval.index_end(e);
+	}
+
+	var n_pri = 0, n_unmapped = 0, n_mapped = 0;
+	var n_exon = 0, n_exon_hit = 0, n_exon_novel = 0;
+
+	file = getopt.ind+1 >= args.length || args[getopt.ind+1] == '-'? new File() : new File(args[getopt.ind+1]);
+	var last_qname = null, qexon = {};
+	var re_cigar = /(\d+)([MIDNSHP=XFGUV])/g;
+
+	warn("Evaluating alignments...");
+	while (file.readline(buf) >= 0) {
+		var m, t = buf.toString().split("\t");
+		var ctg_name = null, cigar = null, pos = null, qname;
+
+		if (t[0].charAt(0) == '@') continue;
+		if (t[0] == "##PAF") t.shift();
+		qname = t[0];
+		if (is_bed) {
+			ctg_name = t[0], pos = parseInt(t[1]), cigar == null;
+		} else if (t[4] == '+' || t[4] == '-' || t[4] == '*') { // PAF
+			ctg_name = t[5], pos = parseInt(t[7]);
+			var type = 'P';
+			for (i = 12; i < t.length; ++i) {
+				if ((m = /^(tp:A|cg:Z):(\S+)/.exec(t[i])) != null) {
+					if (m[1] == 'tp:A') type = m[2];
+					else cigar = m[2];
+				}
+			}
+			if (type == 'S') continue; // secondary
+		} else { // SAM
+			ctg_name = t[2], pos = parseInt(t[3]) - 1, cigar = t[5];
+			var flag = parseInt(t[1]);
+			if (flag&0x100) continue; // secondary
+		}
+
+		if (chr_only && !/^(chr)?([0-9]+|X|Y)$/.test(ctg_name)) continue;
+		if (first_only && last_qname == qname) continue;
+		if (ctg_name == '*') { // unmapped
+			++n_unmapped;
+			continue;
+		} else {
+			++n_pri;
+			if (last_qname != qname) {
+				++n_mapped;
+				last_qname = qname;
+			}
+		}
+
+		var exon = [];
+		if (is_bed) { // BED
+			exon.push([pos, parseInt(t[2])]);
+		} else if (aa) {
+			var tmp_exon = [], tmp = 0, tmp_st = 0;
+			while ((m = re_cigar.exec(cigar)) != null) {
+				var len = parseInt(m[1]), op = m[2];
+				if (op == 'N') {
+					tmp_exon.push([tmp_st, tmp]);
+					tmp_st = tmp + len, tmp += len;
+				} else if (op == 'U') {
+					tmp_exon.push([tmp_st, tmp + 1]);
+					tmp_st = tmp + len - 2, tmp += len;
+				} else if (op == 'V') {
+					tmp_exon.push([tmp_st, tmp + 2]);
+					tmp_st = tmp + len - 1, tmp += len;
+				} else if (op == 'M' || op == 'X' || op == '=' || op == 'D') {
+					tmp += len * 3;
+				} else if (op == 'F' || op == 'G') {
+					tmp += len;
+				}
+			}
+			tmp_exon.push([tmp_st, tmp]);
+			if (t[4] == '+') {
+				for (var i = 0; i < tmp_exon.length; ++i)
+					exon.push([pos + tmp_exon[i][0], pos + tmp_exon[i][1]]);
+			} else if (t[4] == '-') { // For protein-to-genome alignment, the coordinates are on the query strand. Need to flip them.
+				var glen = parseInt(t[8]) - parseInt(t[7]);
+				for (var i = tmp_exon.length - 1; i >= 0; --i)
+					exon.push([pos + (glen - tmp_exon[i][1]), pos + (glen - tmp_exon[i][0])]);
+			}
+		} else {
+			var tmp_st = pos;
+			while ((m = re_cigar.exec(cigar)) != null) {
+				var len = parseInt(m[1]), op = m[2];
+				if (op == 'N') {
+					exon.push([tmp_st, pos]);
+					tmp_st = pos + len, pos += len;
+				} else if (op == 'M' || op == 'X' || op == '=' || op == 'D') pos += len;
+			}
+			exon.push([tmp_st, pos]);
+		}
+		n_exon += exon.length;
+
+		var chr = anno[ctg_name];
+		if (chr != null) {
+			for (var i = 0; i < exon.length; ++i) {
+				if (eval_base) {
+					if (qexon[ctg_name] == null) qexon[ctg_name] = [];
+					qexon[ctg_name].push([exon[i][0], exon[i][1]]);
+				}
+				var o = Interval.find_ovlp(chr, exon[i][0], exon[i][1]);
+				if (o.length > 0) {
+					var hit = false;
+					for (var j = 0; j < o.length; ++j) {
+						var st_diff = exon[i][0] - o[j][0];
+						var en_diff = exon[i][1] - o[j][1];
+						if (st_diff < 0) st_diff = -st_diff;
+						if (en_diff < 0) en_diff = -en_diff;
+						if (st_diff <= l_fuzzy && en_diff <= l_fuzzy)
+							++n_exon_hit, hit = true;
+						if (hit) break;
+					}
+					if (print_ovlp) {
+						var type = hit? 'C' : 'P';
+						if (hit && print_err_only) continue;
+						var x = '[';
+						for (var j = 0; j < o.length; ++j) {
+							if (j) x += ', ';
+							x += '(' + o[j][0] + "," + o[j][1] + ')';
+						}
+						x += ']';
+						print(type, qname, i+1, ctg_name, exon[i][0], exon[i][1], x);
+					}
+				} else {
+					++n_exon_novel;
+					if (print_ovlp)
+						print('N',  qname, i+1, ctg_name, exon[i][0], exon[i][1]);
+				}
+			}
+		} else {
+			n_exon_novel += exon.length;
+		}
+	}
+	file.close();
+
+	buf.destroy();
+
+	if (!print_ovlp) {
+		print("# unmapped reads: " + n_unmapped);
+		print("# mapped reads: " + n_mapped);
+		print("# primary alignments: " + n_pri);
+		print("# predicted exons: " + n_exon);
+		print("# non-overlapping exons: " + n_exon_novel);
+		print("# correct exons: " + n_exon_hit + " (" + (n_exon_hit / n_exon * 100).toFixed(2) + "%)");
+	}
+
+	function merge_and_index(ex) {
+		for (var chr in ex) {
+			var a = [];
+			e = ex[chr];
+			Interval.sort(e);
+			var st = e[0][0], en = e[0][1];
+			for (var i = 1; i < e.length; ++i) { // merge
+				if (e[i][0] > en) {
+					a.push([st, en]);
+					st = e[i][0], en = e[i][1];
+				} else {
+					en = en > e[i][1]? en : e[i][1];
+				}
+			}
+			a.push([st, en]);
+			Interval.index_end(a);
+			ex[chr] = a;
+		}
+	}
+
+	function cal_sn(a0, a1) {
+		var tot = 0, cov = 0;
+		for (var chr in a1) {
+			var e0 = a0[chr], e1 = a1[chr];
+			for (var i = 0; i < e1.length; ++i)
+				tot += e1[i][1] - e1[i][0];
+			if (e0 == null) continue;
+			for (var i = 0; i < e1.length; ++i) {
+				var o = Interval.find_ovlp(e0, e1[i][0], e1[i][1]);
+				for (var j = 0; j < o.length; ++j) { // this only works when there are no overlaps between intervals
+					var st = e1[i][0] > o[j][0]? e1[i][0] : o[j][0];
+					var en = e1[i][1] < o[j][1]? e1[i][1] : o[j][1];
+					cov += en - st;
+				}
+			}
+		}
+		return [tot, cov];
+	}
+
+	if (eval_base) {
+		warn("Computing base Sn and Sp...");
+		merge_and_index(qexon);
+		merge_and_index(anno);
+		var sn = cal_sn(qexon, anno);
+		var sp = cal_sn(anno, qexon);
+		print("Base Sn: " + sn[1] + " / " + sn[0] + " = " + (sn[1] / sn[0] * 100).toFixed(2) + "%");
+		print("Base Sp: " + sp[1] + " / " + sp[0] + " = " + (sp[1] / sp[0] * 100).toFixed(2) + "%");
+	}
+}
+
 // evaluate overlap sensitivity
 function paf_ov_eval(args)
 {
@@ -2704,6 +3019,23 @@ function paf_misjoin(args)
 		return len < (en - st) * cen_ratio? false : true;
 	}
 
+	function test_cen_point(cen, chr, x) {
+		var b = cen[chr];
+		if (b == null) return false;
+		for (var j = 0; j < b.length; ++j)
+			if (x >= b[j][0] && x < b[j][1])
+				return true;
+		return false;
+	}
+
+	if (show_err || show_long) {
+		print("C\tJ  inter-chromosomal misjoin");
+		print("C\tj  inter-chromosomal misjoin with both breakpoints ending in centromeres");
+		print("C\tG  long gap on the reference genome");
+		print("C\tg  long gap on the reference genome with both breakpoints ending in centromeres");
+		print("C\tM  closed inversion");
+		print("C");
+	}
 	function process(a) {
 		var k = 0;
 		for (var i = 0; i < a.length; ++i) {
@@ -2716,14 +3048,17 @@ function paf_misjoin(args)
 		a = a.sort(function(x,y){return x[2]-y[2]});
 		if (show_long) for (var i = 0; i < a.length; ++i) print(a[i].join("\t"));
 		for (var i = 1; i < a.length; ++i) {
-			var ov = [false, false];
+			var ov = [false, false], end_cen = [false, false];
 			ov[0] = test_cen(cen, a[i-1][5], a[i-1][7], a[i-1][8]);
 			ov[1] = test_cen(cen, a[i][5], a[i][7], a[i][8]);
+			end_cen[0] = test_cen_point(cen, a[i-1][5], a[i-1][4] == '+'? a[i-1][8] : a[i-1][7]);
+			end_cen[1] = test_cen_point(cen, a[i][5],   a[i][4] == '+'?   a[i][7]   : a[i][8]);
 			if (a[i-1][5] != a[i][5]) { // different chr
 				if (ov[0] || ov[1]) ++n_diff[1];
 				else if (show_err) {
-					print("J", a[i-1].slice(0, 12).join("\t"));
-					print("J", a[i].slice(0, 12).join("\t"));
+					var label = end_cen[0] && end_cen[1]? 'j' : 'J';
+					print(label, a[i-1].slice(0, 12).join("\t"));
+					print(label, a[i].slice(0, 12).join("\t"));
 				}
 				++n_diff[0];
 			} else if (a[i-1][4] == a[i][4]) { // a gap
@@ -2733,8 +3068,9 @@ function paf_misjoin(args)
 				if (gap > max_gap) {
 					if (ov[0] || ov[1]) ++n_gap[1];
 					else if (show_err) {
-						print("G", a[i-1].slice(0, 12).join("\t"));
-						print("G", a[i].slice(0, 12).join("\t"));
+						var label = end_cen[0] && end_cen[1]? 'g' : 'G';
+						print(label, a[i-1].slice(0, 12).join("\t"));
+						print(label, a[i].slice(0, 12).join("\t"));
 					}
 					++n_gap[0];
 				}
@@ -3084,6 +3420,183 @@ function paf_pafcmp(args)
 	buf.destroy();
 }
 
+function paf_longcs2seq(args) {
+	var c, opt = { query:false };
+	while ((c = getopt(args, "q")) != null)
+		if (c == 'q') opt.query = true;
+	if (args.length == getopt.ind) {
+		print("Usage: paftools.js longcs2seq [-q] <long-cs.paf>");
+		return;
+	}
+	var re_cs = /([:=*+-])(\d+|[A-Za-z]+)/g
+	var buf = new Bytes();
+	var file = args[getopt.ind] == "-"? new File() : new File(args[getopt.ind]);
+	while (file.readline(buf) >= 0) {
+		var m, cs = null, t = buf.toString().split("\t");
+		for (var i = 12; i < t.length; ++i)
+			if ((m = /^cs:Z:(\S+)/.exec(t[i])) != null) {
+				cs = m[1];
+				break;
+			}
+		if (cs == null) continue;
+		var ts = "", qs = "";
+		while ((m = re_cs.exec(cs)) != null) {
+			if (m[1] == "=") ts += m[2], qs += m[2];
+			else if (m[1] == "+") qs += m[2].toUpperCase();
+			else if (m[1] == "-") ts += m[2].toUpperCase();
+			else if (m[1] == "*") ts += m[2][0].toUpperCase(), qs += m[2][1].toUpperCase();
+			else if (m[1] == ":") throw Error("Long cs is required");
+		}
+		if (opt.query) {
+			print(">" + t[0] + "_" + t[2] + "_" + t[3]);
+			print(qs);
+		} else {
+			print(">" + t[5] + "_" + t[7] + "_" + t[8]);
+			print(ts);
+		}
+	}
+	file.close();
+	buf.destroy();
+}
+
+function paf_paf2gff(args) {
+	var c, opt = { aa:false };
+	var re_cigar = /(\d+)([A-Z=])/g;
+	while ((c = getopt(args, "a")) != null) {
+		if (c == 'a') opt.aa = true;
+	}
+	if (args.length == getopt.ind) {
+		print("Usage: paftools.js paf2gff [-a] <in.paf>");
+		return;
+	}
+	var buf = new Bytes();
+	var file = args[getopt.ind] == '-'? new File() : new File(args[getopt.ind]);
+	var hid = 1, last_name = null;
+	while (file.readline(buf) >= 0) {
+		var m, t = buf.toString().split("\t");
+		if (t[5] == '*') continue; // skip unmapped lines
+
+		if (t[0] != last_name) last_name = t[0], hid = 1;
+		else ++hid;
+		for (var i = 1; i <= 3; ++i) t[i] = parseInt(t[i]);
+		for (var i = 6; i <= 11; ++i) t[i] = parseInt(t[i]);
+		var cigar = null, score = null, np = null, dist_stop = null, dist_start = null;
+		for (var i = 12; i < t.length; ++i) {
+			if ((m = /^(cg:Z|AS:i|np:i|da:i|do:i):(\S+)/.exec(t[i])) != null) {
+				if (m[1] == 'cg:Z') cigar = m[2];
+				else if (m[1] == 'AS:i') score = parseInt(m[2]);
+				else if (m[1] == 'np:i') np = parseInt(m[2]);
+				else if (m[1] == 'do:i') dist_stop = parseInt(m[2]);
+				else if (m[1] == 'da:i') dist_start = parseInt(m[2]);
+			}
+		}
+		if (cigar == null) throw Error("failed to find the cg:Z tag");
+		if (score == null) throw Error("failed to find the AS:i tag");
+
+		var st = 0, en = 0, phase = 0, pseudo = false, fs = 0, a = [];
+		if (dist_start != null && dist_start == 0)
+			a.push([t[5], 'paf2gff', 'start_codon', 0, 3, 0, t[4], '.', 0]);
+		while ((m = re_cigar.exec(cigar)) != null) {
+			var len = parseInt(m[1]);
+			if (m[2] == 'M' || m[2] == 'D') {
+				en += opt.aa? len * 3 : len;
+			} else if (m[2] == 'F' || m[2] == 'G' || m[2] == 'R') {
+				en += len, pseudo = true, fs = 1;
+			} else if (m[2] == 'N') {
+				a.push([t[5], 'paf2gff', 'exon', st, en, 0, t[4], phase, fs]);
+				st = en + len, en += len, phase = 0, fs = 0;
+			} else if (m[2] == 'U') { // ...xGT...AGxx...
+				a.push([t[5], 'paf2gff', 'exon', st, en + 1, 0, t[4], phase, fs]);
+				st = en + len - 2, en += len, phase = 2, fs = 0;
+			} else if (m[2] == 'V') { // ...xxGT...AGx...
+				a.push([t[5], 'paf2gff', 'exon', st, en + 2, 0, t[4], phase, fs]);
+				st = en + len - 1, en += len, phase = 1, fs = 0;
+			}
+		}
+		a.push([t[5], 'paf2gff', 'exon', st, en, 0, t[4], phase, fs]);
+		if (en != t[8] - t[7]) throw Error("inconsistent cigar");
+		if (dist_stop != null && dist_stop == 0)
+			a.push([t[5], 'paf2gff', 'stop_codon', en, en + 3, 0, t[4], '.', 0]);
+		var type = pseudo? 'pseudogene' : 'protein_coding';
+		var attr = ['transcript_id=' + t[0] + '#' + hid, 'transcript_type=' + type].join(";");
+		var trans_attr = 'identity=' + (t[9] / t[10]).toFixed(4);
+		if (np != null) trans_attr += ';positive=' + (np * 3 / t[10]).toFixed(4);
+		trans_attr += ';aa_start=' + t[2];
+		trans_attr += ';aa_end=' + (t[1] - t[3]);
+		if (dist_start != null && dist_start >= 0) trans_attr += ';dist_start_codon=' + dist_start;
+		if (dist_stop != null && dist_stop >= 0) trans_attr += ';dist_stop_codon=' + dist_stop;
+		var trans_st = t[7], trans_en = t[8];
+		if (dist_stop != null && dist_stop == 0) {
+			if (t[4] == '-') trans_st -= 3;
+			else trans_en += 3;
+		}
+		print([t[5], 'paf2gff', 'transcript', trans_st + 1, trans_en, score, t[4], '.', attr + ';' + trans_attr].join("\t"));
+		if (opt.aa && t[4] == '-') {
+			var b = [], len = t[8] - t[7];
+			for (var i = a.length - 1; i >= 0; --i) {
+				var x = len - a[i][3];
+				a[i][3] = len - a[i][4];
+				a[i][4] = x;
+				//a[i][7] = a[i][7] == 0? 0 : 3 - a[i][7]; // not sure if this line is needed
+				b.push(a[i]);
+			}
+			a = b;
+		}
+		for (var i = 0; i < a.length; ++i) {
+			if (!pseudo && a[i][2] == "exon") a[i][2] = "CDS";
+			a[i][3] += t[7] + 1;
+			a[i][4] += t[7];
+			a[i][8] = attr + ";frameshift=" + a[i][8];
+			print(a[i].join("\t"));
+		}
+	}
+	file.close();
+	buf.destroy();
+}
+
+function paf_gff2junc(args) {
+	var c, feat = "CDS";
+	while ((c = getopt(args, "f:")) != null) {
+		if (c == 'f') feat = getopt.arg;
+	}
+	if (getopt.ind == args.length) {
+		print("Usage: paftools.js gff2junc [-f feature] <in.gff3>");
+		return;
+	}
+	var buf = new Bytes();
+	var file = args[getopt.ind] == "-"? new File() : new File(args[getopt.ind]);
+
+	function process_a(a) {
+		if (a.length < 2) return;
+		a = a.sort(function(x, y) { return x[4] - y[4] });
+		for (var i = 1; i < a.length; ++i)
+			print([a[i][1], a[i-1][5], a[i][4], a[i][0], 0, a[i][7]].join("\t"));
+	}
+
+	var a = [];
+	while (file.readline(buf) >= 0) {
+		var m, t = buf.toString().split("\t");
+		if (t[0][0] == '#') continue;
+		if (t[2].toLowerCase() != feat.toLowerCase()) continue;
+		//print(t.join("\t"));
+		if ((m = /\bParent=([^;]+)/.exec(t[8])) == null) {
+			warn("Can't find Parent");
+			continue;
+		}
+		t[3] = parseInt(t[3]) - 1;
+		t[4] = parseInt(t[4]);
+		t.unshift(m[1]);
+		if (a.length > 0 && a[0][0] != m[1]) {
+			process_a(a);
+			a.length = 0;
+			a.push(t);
+		} else a.push(t);
+	}
+	process_a(a);
+	file.close();
+	buf.destroy();
+}
+
 /*************************
  ***** main function *****
  *************************/
@@ -3098,6 +3611,9 @@ function main(args)
 		print("  sam2paf    convert SAM to PAF");
 		print("  delta2paf  convert MUMmer's delta to PAF");
 		print("  gff2bed    convert GTF/GFF3 to BED12");
+		print("  gff2junc   convert GFF3 to junction BED");
+		print("  longcs2seq convert long-cs PAF to sequences");
+//		print("  paf2gff    convert PAF to GFF3 (tested for miniprot only)");
 		print("");
 		print("  stat       collect basic mapping information in PAF/SAM");
 		print("  asmstat    collect basic assembly information");
@@ -3115,6 +3631,7 @@ function main(args)
 		print("  mason2fq   convert mason2-simulated SAM to FASTQ");
 		print("  pbsim2fq   convert PBSIM-simulated MAF to FASTQ");
 		print("  junceval   evaluate splice junction consistency with known annotations");
+		print("  exoneval   evaluate exon-level consistency with known annotations");
 		print("  ov-eval    evaluate read overlap sensitivity using read-to-ref mapping");
 		exit(1);
 	}
@@ -3125,6 +3642,7 @@ function main(args)
 	else if (cmd == 'delta2paf') paf_delta2paf(args);
 	else if (cmd == 'splice2bed') paf_splice2bed(args);
 	else if (cmd == 'gff2bed') paf_gff2bed(args);
+	else if (cmd == 'gff2junc') paf_gff2junc(args);
 	else if (cmd == 'stat') paf_stat(args);
 	else if (cmd == 'asmstat') paf_asmstat(args);
 	else if (cmd == 'asmgene') paf_asmgene(args);
@@ -3138,10 +3656,13 @@ function main(args)
 	else if (cmd == 'mason2fq') paf_mason2fq(args);
 	else if (cmd == 'pbsim2fq') paf_pbsim2fq(args);
 	else if (cmd == 'junceval') paf_junceval(args);
+	else if (cmd == 'exoneval') paf_exoneval(args);
 	else if (cmd == 'ov-eval') paf_ov_eval(args);
 	else if (cmd == 'vcfstat') paf_vcfstat(args);
 	else if (cmd == 'sveval') paf_sveval(args);
 	else if (cmd == 'vcfsel') paf_vcfsel(args);
+	else if (cmd == 'longcs2seq') paf_longcs2seq(args);
+	else if (cmd == 'paf2gff') paf_paf2gff(args);
 	else if (cmd == 'version') print(paftools_version);
 	else throw Error("unrecognized command: " + cmd);
 }
diff --git a/options.c b/options.c
index 235b6dd..b3344b5 100644
--- a/options.c
+++ b/options.c
@@ -8,7 +8,7 @@ void mm_idxopt_init(mm_idxopt_t *opt)
 	opt->k = 15, opt->w = 10, opt->flag = 0;
 	opt->bucket_bits = 14;
 	opt->mini_batch_size = 50000000;
-	opt->batch_size = 4000000000ULL;
+	opt->batch_size = 8000000000ULL;
 }
 
 void mm_mapopt_init(mm_mapopt_t *opt)
diff --git a/pyproject.toml b/pyproject.toml
new file mode 100644
index 0000000..bc2be67
--- /dev/null
+++ b/pyproject.toml
@@ -0,0 +1,2 @@
+[build-system]
+requires = ["setuptools", "wheel", "Cython"]
diff --git a/python/mappy.pyx b/python/mappy.pyx
index cf47632..53eea5c 100644
--- a/python/mappy.pyx
+++ b/python/mappy.pyx
@@ -3,7 +3,7 @@ from libc.stdlib cimport free
 cimport cmappy
 import sys
 
-__version__ = '2.24'
+__version__ = '2.26'
 
 cmappy.mm_reset_timer()
 
@@ -172,6 +172,7 @@ cdef class Aligner:
 		cdef cmappy.mm_mapopt_t map_opt
 
 		if self._idx == NULL: return
+		if ((self.map_opt.flag & 4) and (self._idx.flag & 2)): return
 		map_opt = self.map_opt
 		if max_frag_len is not None: map_opt.max_frag_len = max_frag_len
 		if extra_flags is not None: map_opt.flag |= extra_flags
@@ -217,6 +218,7 @@ cdef class Aligner:
 		cdef int l
 		cdef char *s
 		if self._idx == NULL: return
+		if ((self.map_opt.flag & 4) and (self._idx.flag & 2)): return
 		s = cmappy.mappy_fetch_seq(self._idx, name.encode(), start, end, &l)
 		if l == 0: return None
 		r = s[:l] if isinstance(s, str) else s[:l].decode()
diff --git a/seed.c b/seed.c
index 08fe5ad..76a67ae 100644
--- a/seed.c
+++ b/seed.c
@@ -7,7 +7,7 @@ void mm_seed_mz_flt(void *km, mm128_v *mv, int32_t q_occ_max, float q_occ_frac)
 	mm128_t *a;
 	size_t i, j, st;
 	if (mv->n <= q_occ_max || q_occ_frac <= 0.0f || q_occ_max <= 0) return;
-	KMALLOC(km, a, mv->n);
+	a = Kmalloc(km, mm128_t, mv->n);
 	for (i = 0; i < mv->n; ++i)
 		a[i].x = mv->a[i].x, a[i].y = i;
 	radix_sort_128x(a, a + mv->n);
diff --git a/setup.py b/setup.py
index e6766fa..fa6d086 100644
--- a/setup.py
+++ b/setup.py
@@ -23,7 +23,7 @@ def readme():
 
 setup(
 	name = 'mappy',
-	version = '2.24',
+	version = '2.26',
 	url = 'https://github.com/lh3/minimap2',
 	description = 'Minimap2 python binding',
 	long_description = readme(),

Debdiff

[The following lists of changes regard files as different if they have different names, permissions or owners.]

Files in second set of .debs but not in first

-rw-r--r--  root/root   /usr/lib/debug/.build-id/1b/6f92c9c5c46fc82bbbae73ba73666294d91332.debug
-rw-r--r--  root/root   /usr/lib/debug/.build-id/7f/cb2c4a5a3ec3d34989bede236759e17a88c24c.debug
-rw-r--r--  root/root   /usr/lib/python3/dist-packages/mappy-2.26.egg-info/PKG-INFO
-rw-r--r--  root/root   /usr/lib/python3/dist-packages/mappy-2.26.egg-info/dependency_links.txt
-rw-r--r--  root/root   /usr/lib/python3/dist-packages/mappy-2.26.egg-info/top_level.txt

Files in first set of .debs but not in second

-rw-r--r--  root/root   /usr/lib/debug/.build-id/00/5b1006cf2ac31adb643ccb34d47f4cf119b796.debug
-rw-r--r--  root/root   /usr/lib/debug/.build-id/70/2591777856967d35babe8ecaaddfb6b181faf5.debug
-rw-r--r--  root/root   /usr/lib/python3/dist-packages/mappy-2.24.egg-info/PKG-INFO
-rw-r--r--  root/root   /usr/lib/python3/dist-packages/mappy-2.24.egg-info/dependency_links.txt
-rw-r--r--  root/root   /usr/lib/python3/dist-packages/mappy-2.24.egg-info/top_level.txt

No differences were encountered between the control files of package libminimap2-dev

No differences were encountered between the control files of package minimap2

Control files of package minimap2-dbgsym: lines which differ (wdiff format)

  • Build-Ids: 702591777856967d35babe8ecaaddfb6b181faf5 7fcb2c4a5a3ec3d34989bede236759e17a88c24c

No differences were encountered between the control files of package python3-mappy

Control files of package python3-mappy-dbgsym: lines which differ (wdiff format)

  • Build-Ids: 005b1006cf2ac31adb643ccb34d47f4cf119b796 1b6f92c9c5c46fc82bbbae73ba73666294d91332

More details

Full run details