Codebase list gmap / 2deaf4f
Imported Upstream version 2016-08-16 Alexandre Mestiashvili 7 years ago
45 changed file(s) with 3451 addition(s) and 3588 deletion(s). Raw diff Collapse all Expand all
0 2016-08-16 twu
1
2 * VERSION: Updated version number
3
4 * README: Discussing MAX_STACK_READLENGTH
5
6 * gsnap.c, uniqscan.c: Using MAX_FLOORS_READLENGTH instead of MAX_READLENGTH
7
8 * configure.ac: Using MAX_STACK_READLENGTH instead of MAX_READLENGTH
9
10 * Makefile.gsnaptoo.am: Using MAX_STACK_READLENGTH instead of MAX_READLENGTH
11
12 * stage1hr.h: Adding max_floor_readlength to setup
13
14 * stage1hr.c: Removed local allocation of arrays of size MAX_READLENGTH.
15 Now checking querylength against MAX_STACK_READLENGTH to determine whether
16 to allocate from stack or heap. Adding max_floor_readlength to setup
17
18 * indel.c, mapq.c, sarray-read.c, splice.c: Removed local allocation of
19 arrays of size MAX_READLENGTH. Now checking querylength against
20 MAX_STACK_READLENGTH to determine whether to allocate from stack or heap
21
22 * stage3hr.c: Not allowing any indels to set trims in determining optimal
23 score
24
25 * stage1hr.c: Using pre-processor macro LONG_READLENGTHS to allocate
26 read-related memory on heap instead of stack. Setting spliceable_high_p
27 to be false for last segment. In computing end indels, ensuring that
28 shifti is not negative when looking up array value.
29
30 * shortread.c: Using MAX_EXPECTED_READLENGTH instead of MAX_READLENGTH
31
32 * stage3.c: Handling the case when trimming ends that exon is empty
33
34 * stage3hr.c: Restored setting of abort_pairing_p when nconcordant exceeds
35 maxpairedpaths
36
37 * gsnap.c, uniqscan.c: Using new interface to Pair_setup
38
39 * indel.c, mapq.c, sarray-read.c, splice.c, substring.c: Using pre-processor
40 macro LONG_READLENGTHS to allocate read-related memory on heap instead of
41 stack
42
43 * gmap.c, pair.c, pair.h: Added option --gff3-swap-phase
44
45 * bytecoding.c: Added explanation messages to remove shared memory segments
46
47 2016-08-12 twu
48
49 * Makefile.gsnaptoo.am, config.site.rescomp.prd, configure.ac, filestring.c,
50 genome_sites.c, gsnap.c, pair.c, samprint.c, sarray-read.c, sedgesort.c,
51 sedgesort.h, shortread.c, splice.c, src, stage1hr.c, stage3hr.c,
52 stage3hr.h, substring.c, substring.h, trunk, univdiag.c, univdiag.h, util:
53 Merged revisions 195608 to 196272 from
54 branches/2016-08-09-genome-sites-hr, which contains merged revisions from
55 branches/2016-08-02-long-read-fusions and 2016-07-01-better-triage
56
57 * VERSION, trunk: Updated version number
58
59 * Makefile.gsnaptoo.am: Removed chrsubset.c and chrsubset.h for
60 splicing-score
61
62 * pair.c: Added variable to swap phase for gff3 output
63
64 * configure.ac: Added a line to disable maintainer mode for users
65
66 * config.site.rescomp.prd, config.site.rescomp.tst: Updated for latest
67 version
68
69 * MAINTAINER: Added note about PATH
70
71 * archive.html, index.html: Updated for latest version
72
073 2016-08-08 twu
174
2 * atoi.c, cmet.c: Fixed reduce procedures for 64-bit oligos
3
4 * stage1hr.c: Fixed values of splice_pos_start and splice_pos_end given to
5 Genome_donor_positions and related functions
6
7 * filestring.c: Handling the case where stringlen is negative
8
9 * stage3.c: Merged revision 195962 from trunk to fix an issue where we tried
10 to use pairs_pretrim after path_trim altered the pairs
11
12 * samprint.c, substring.c, substring.h: Merged revision 195960 from trunk to
13 fix XT field to have correct fusion coordinates
14
15 2016-08-04 twu
16
17 * 2016-08-02-long-read-fusions, comp.h, config.site.rescomp.prd, pair.c,
18 pairpool.c, sarray-read.c, src, stage3.c, util: Merged revisions 195492
19 through 195762 from branches/2016-07-01-better-triage to get latest fixes
20
21 * 2016-08-02-long-read-fusions, Makefile.gsnaptoo.am, comp.h,
22 config.site.rescomp.prd, configure.ac, filestring.c, gsnap.c, pair.c,
23 pairpool.c, samprint.c, sarray-read.c, sedgesort.c, sedgesort.h,
24 shortread.c, src, stage1hr.c, stage3.c, stage3hr.c, stage3hr.h,
25 substring.c, substring.h, univdiag.c, univdiag.h: Merged revisions 193240
26 to 195491 from branches/2016-07-01-better-triage for better performance
27
28 2016-08-03 twu
29
30 * stage1hr.c: Hard-coded some values for plusp
31
32 * splice.c: Using new interface to Substring_new_donor and
33 Substring_new_acceptor
34
35 * stage1hr.c: In computing spliceable segments, using a variable for holding
36 previous spliceable information, to resolve writing to an uninitialized
37 ptr at end. Using a streamlined version of splicing for distant RNA.
38
39 * substring.c, substring.h: Added parameters substring_querystart and
40 substring_queryend to Substring_new_donor and Substring_new_acceptor, so
41 we can handle splicing segments in the middle of the read
42
43 * genome_sites.c: Added debugging statements
44
45 * stage1hr.c: Allowing fusions to occur between middle segments that are
46 spliceable on their distal ends
75 * gtf_genes.pl.in, gtf_introns.pl.in, gtf_splicesites.pl.in: Printing both
76 gene_id and gene_name
77
78 * atoi.c, cmet.c: Fixed reduce procedures for 64-bit computers
79
80 * Makefile.gsnaptoo.am: Added semaphore.c and semaphore.h to list of files
81 for splicing-score
82
83 * stage1hr.c: Fixed debugging statements
84
85 * stage3.c: Fixed issue where we tried to use pairs_pretrim after path_trim
86 altered the pairs
87
88 * samprint.c, substring.c, substring.h: Fixed XT field to print correct
89 junction coordinates
4790
4891 2016-08-02 twu
49
50 * 2016-08-02-long-read-fusions: Created branch to find fusions in long reads
5192
5293 * stage3hr.c: Restoring final procedure based on nmatches in
5394 Stage3pair_optimal_score
183183 $(top_srcdir)/config/config.sub \
184184 $(top_srcdir)/config/install-sh $(top_srcdir)/config/ltmain.sh \
185185 $(top_srcdir)/config/missing AUTHORS COPYING ChangeLog INSTALL \
186 NEWS README config/compile config/config.guess \
186 NEWS README TODO config/compile config/config.guess \
187187 config/config.sub config/install-sh config/ltmain.sh \
188188 config/missing
189189 DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
272272 LN_S = @LN_S@
273273 LTLIBOBJS = @LTLIBOBJS@
274274 LT_SYS_LIBRARY_PATH = @LT_SYS_LIBRARY_PATH@
275 MAINT = @MAINT@
275276 MAKEINFO = @MAKEINFO@
276277 MANIFEST_TOOL = @MANIFEST_TOOL@
277 MAX_READLENGTH = @MAX_READLENGTH@
278 MAX_STACK_READLENGTH = @MAX_STACK_READLENGTH@
278279 MKDIR_P = @MKDIR_P@
279280 MPICC = @MPICC@
280281 MPILIBS = @MPILIBS@
378379 .SUFFIXES:
379380 am--refresh: Makefile
380381 @:
381 $(srcdir)/Makefile.in: $(srcdir)/Makefile.am $(am__configure_deps)
382 $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps)
382383 @for dep in $?; do \
383384 case '$(am__configure_deps)' in \
384385 *$$dep*) \
404405 $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES)
405406 $(SHELL) ./config.status --recheck
406407
407 $(top_srcdir)/configure: $(am__configure_deps)
408 $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
408409 $(am__cd) $(srcdir) && $(AUTOCONF)
409 $(ACLOCAL_M4): $(am__aclocal_m4_deps)
410 $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
410411 $(am__cd) $(srcdir) && $(ACLOCAL) $(ACLOCAL_AMFLAGS)
411412 $(am__aclocal_m4_deps):
412413
5050 ./configure CONFIG_SITE=<config site file>
5151
5252
53 Note 3: GSNAP is designed for short reads of a limited length, and
54 uses a configure variable called MAX_READLENGTH (default 300) as a
55 guide to the maximum read length. You may set this variable by
56 providing it to configure like this
57
58 ./configure MAX_READLENGTH=<length>
53 Note 3: GSNAP previously had a configure variable called
54 MAX_READLENGTH (default 300) as a guide to the maximum read length.
55 That variable is no longer needed, since GSNAP can align reads of
56 arbitrary length. (But, for longer reads, GMAP will probably be much
57 faster.)
58
59 However, whenever possible, based on the length of the read, GSNAP
60 will use stack memory instead of heap memory for some algorithms. To
61 control this decision, there is a variable called
62 MAX_STACK_READLENGTH, set like this
63
64 ./configure MAX_STACK_READLENGTH=<length>
5965
6066 or by defining it in your config.site file (or in the file provided to
6167 configure as the value of CONFIG_SITE). Or you may set the value of
62 MAX_READLENGTH as an environment variable before calling ./configure.
63 If you do not set MAX_READLENGTH, it will have the default value shown
64 when you run "./configure --help".
65
66 Note that MAX_READLENGTH applies only to GSNAP. GMAP, on the other
67 hand, can process queries up to 1 million bp.
68
69 Also, starting with version 2014-08-20, if your C compiler can
70 handle stack-based memory allocation using the alloca() function,
71 GSNAP ignores MAX_READLENGTH, and can handle reads longer than that
72 value.
68 MAX_STACK_READLENGTH as an environment variable before calling
69 ./configure. If you set MAX_STACK_READLENGTH too high, you may
70 overflow the amount of stack allocated by your computer. If you do
71 not set MAX_STACK_READLENGTH, it will have a default value of 300.
7372
7473
7574 Note 4: GSNAP can read from gzip-compressed FASTA or FASTQ input
0
1 Add flag that allows for splitting afterwards.
2
0 2016-08-08
0 2016-08-16
644644 rmdir .tst 2>/dev/null
645645 AC_SUBST([am__leading_dot])])
646646
647 # Add --enable-maintainer-mode option to configure. -*- Autoconf -*-
648 # From Jim Meyering
649
650 # Copyright (C) 1996-2014 Free Software Foundation, Inc.
651 #
652 # This file is free software; the Free Software Foundation
653 # gives unlimited permission to copy and/or distribute it,
654 # with or without modifications, as long as this notice is preserved.
655
656 # AM_MAINTAINER_MODE([DEFAULT-MODE])
657 # ----------------------------------
658 # Control maintainer-specific portions of Makefiles.
659 # Default is to disable them, unless 'enable' is passed literally.
660 # For symmetry, 'disable' may be passed as well. Anyway, the user
661 # can override the default with the --enable/--disable switch.
662 AC_DEFUN([AM_MAINTAINER_MODE],
663 [m4_case(m4_default([$1], [disable]),
664 [enable], [m4_define([am_maintainer_other], [disable])],
665 [disable], [m4_define([am_maintainer_other], [enable])],
666 [m4_define([am_maintainer_other], [enable])
667 m4_warn([syntax], [unexpected argument to AM@&t@_MAINTAINER_MODE: $1])])
668 AC_MSG_CHECKING([whether to enable maintainer-specific portions of Makefiles])
669 dnl maintainer-mode's default is 'disable' unless 'enable' is passed
670 AC_ARG_ENABLE([maintainer-mode],
671 [AS_HELP_STRING([--]am_maintainer_other[-maintainer-mode],
672 am_maintainer_other[ make rules and dependencies not useful
673 (and sometimes confusing) to the casual installer])],
674 [USE_MAINTAINER_MODE=$enableval],
675 [USE_MAINTAINER_MODE=]m4_if(am_maintainer_other, [enable], [no], [yes]))
676 AC_MSG_RESULT([$USE_MAINTAINER_MODE])
677 AM_CONDITIONAL([MAINTAINER_MODE], [test $USE_MAINTAINER_MODE = yes])
678 MAINT=$MAINTAINER_MODE_TRUE
679 AC_SUBST([MAINT])dnl
680 ]
681 )
682
647683 # Check to see how 'make' treats includes. -*- Autoconf -*-
648684
649685 # Copyright (C) 2001-2014 Free Software Foundation, Inc.
00 #! /bin/sh
11 # Guess values for system-dependent variables and create Makefiles.
2 # Generated by GNU Autoconf 2.69 for gmap 2016-08-08.
2 # Generated by GNU Autoconf 2.69 for gmap 2016-08-16.
33 #
44 # Report bugs to <Thomas Wu <twu@gene.com>>.
55 #
589589 # Identity of this package.
590590 PACKAGE_NAME='gmap'
591591 PACKAGE_TARNAME='gmap'
592 PACKAGE_VERSION='2016-08-08'
593 PACKAGE_STRING='gmap 2016-08-08'
592 PACKAGE_VERSION='2016-08-16'
593 PACKAGE_STRING='gmap 2016-08-16'
594594 PACKAGE_BUGREPORT='Thomas Wu <twu@gene.com>'
595595 PACKAGE_URL=''
596596
637637 LIBOBJS
638638 BZLIB_LIBS
639639 ZLIB_LIBS
640 MAX_READLENGTH
640 MAX_STACK_READLENGTH
641641 GMAPDB
642642 MAKE_SSE2_FALSE
643643 MAKE_SSE2_TRUE
693693 MAINTAINER_TRUE
694694 FULLDIST_FALSE
695695 FULLDIST_TRUE
696 MAINT
697 MAINTAINER_MODE_FALSE
698 MAINTAINER_MODE_TRUE
696699 AM_BACKSLASH
697700 AM_DEFAULT_VERBOSITY
698701 AM_DEFAULT_V
794797 enable_largefile
795798 enable_dependency_tracking
796799 enable_silent_rules
800 enable_maintainer_mode
797801 enable_fulldist
798802 enable_maintainer
799803 enable_shared
826830 MPICC
827831 LT_SYS_LIBRARY_PATH
828832 CPP
829 MAX_READLENGTH'
833 MAX_STACK_READLENGTH'
830834
831835
832836 # Initialize some variables set by options.
13671371 # Omit some internal or obsolete options to make the list less imposing.
13681372 # This message is too long to be a string in the A/UX 3.1 sh.
13691373 cat <<_ACEOF
1370 \`configure' configures gmap 2016-08-08 to adapt to many kinds of systems.
1374 \`configure' configures gmap 2016-08-16 to adapt to many kinds of systems.
13711375
13721376 Usage: $0 [OPTION]... [VAR=VALUE]...
13731377
14381442
14391443 if test -n "$ac_init_help"; then
14401444 case $ac_init_help in
1441 short | recursive ) echo "Configuration of gmap 2016-08-08:";;
1445 short | recursive ) echo "Configuration of gmap 2016-08-16:";;
14421446 esac
14431447 cat <<\_ACEOF
14441448
14531457 speeds up one-time build
14541458 --enable-silent-rules less verbose build output (undo: "make V=1")
14551459 --disable-silent-rules verbose build output (undo: "make V=0")
1460 --enable-maintainer-mode
1461 enable make rules and dependencies not useful (and
1462 sometimes confusing) to the casual installer
14561463 --enable-fulldist For use by program maintainer
14571464 --enable-maintainer For use by program maintainer
14581465 --enable-shared[=PKGS] build shared libraries [default=yes]
15041511 LT_SYS_LIBRARY_PATH
15051512 User-defined run-time library search path.
15061513 CPP C preprocessor
1507 MAX_READLENGTH
1508 Maximum read length for GSNAP (default 300)
1514 MAX_STACK_READLENGTH
1515 Maximum read length for GSNAP allocating on stack rather than
1516 heap (default 300)
15091517
15101518 Use these variables to override the choices made by `configure' or to help
15111519 it to find libraries and programs with nonstandard names/locations.
15731581 test -n "$ac_init_help" && exit $ac_status
15741582 if $ac_init_version; then
15751583 cat <<\_ACEOF
1576 gmap configure 2016-08-08
1584 gmap configure 2016-08-16
15771585 generated by GNU Autoconf 2.69
15781586
15791587 Copyright (C) 2012 Free Software Foundation, Inc.
21792187 This file contains any messages produced by compilers while
21802188 running configure, to aid debugging if configure makes a mistake.
21812189
2182 It was created by gmap $as_me 2016-08-08, which was
2190 It was created by gmap $as_me 2016-08-16, which was
21832191 generated by GNU Autoconf 2.69. Invocation command line was
21842192
21852193 $ $0 $@
25292537
25302538 { $as_echo "$as_me:${as_lineno-$LINENO}: checking package version" >&5
25312539 $as_echo_n "checking package version... " >&6; }
2532 { $as_echo "$as_me:${as_lineno-$LINENO}: result: 2016-08-08" >&5
2533 $as_echo "2016-08-08" >&6; }
2540 { $as_echo "$as_me:${as_lineno-$LINENO}: result: 2016-08-16" >&5
2541 $as_echo "2016-08-16" >&6; }
25342542
25352543
25362544 ### Read defaults
43954403
43964404 # Define the identity of the package.
43974405 PACKAGE='gmap'
4398 VERSION='2016-08-08'
4406 VERSION='2016-08-16'
43994407
44004408
44014409 cat >>confdefs.h <<_ACEOF
46154623 as_fn_error $? "Your 'rm' program is bad, sorry." "$LINENO" 5
46164624 fi
46174625 fi
4626
4627
4628 { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to enable maintainer-specific portions of Makefiles" >&5
4629 $as_echo_n "checking whether to enable maintainer-specific portions of Makefiles... " >&6; }
4630 # Check whether --enable-maintainer-mode was given.
4631 if test "${enable_maintainer_mode+set}" = set; then :
4632 enableval=$enable_maintainer_mode; USE_MAINTAINER_MODE=$enableval
4633 else
4634 USE_MAINTAINER_MODE=no
4635 fi
4636
4637 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $USE_MAINTAINER_MODE" >&5
4638 $as_echo "$USE_MAINTAINER_MODE" >&6; }
4639 if test $USE_MAINTAINER_MODE = yes; then
4640 MAINTAINER_MODE_TRUE=
4641 MAINTAINER_MODE_FALSE='#'
4642 else
4643 MAINTAINER_MODE_TRUE='#'
4644 MAINTAINER_MODE_FALSE=
4645 fi
4646
4647 MAINT=$MAINTAINER_MODE_TRUE
4648
4649
46184650
46194651
46204652 if test "x$enable_fulldist" = xyes; then
1523715269
1523815270 #AC_FUNC_MMAP # Checks only private fixed mapping of already-mapped memory
1523915271
15272
1524015273 { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether alloca is enabled" >&5
1524115274 $as_echo_n "checking whether alloca is enabled... " >&6; }
1524215275 # Check whether --enable-alloca was given.
1891718950 $as_echo "$GMAPDB" >&6; }
1891818951
1891918952
18920 # MAX_READLENGTH
18921 { $as_echo "$as_me:${as_lineno-$LINENO}: checking MAX_READLENGTH" >&5
18922 $as_echo_n "checking MAX_READLENGTH... " >&6; }
18923
18924 if test x"$MAX_READLENGTH" = x; then
18925
18926 EXP_VAR=MAX_READLENGTH
18953 # MAX_STACK_READLENGTH
18954 { $as_echo "$as_me:${as_lineno-$LINENO}: checking MAX_STACK_READLENGTH" >&5
18955 $as_echo_n "checking MAX_STACK_READLENGTH... " >&6; }
18956
18957 if test x"$MAX_STACK_READLENGTH" = x; then
18958
18959 EXP_VAR=MAX_STACK_READLENGTH
1892718960 FROM_VAR='300'
1892818961
1892918962 prefix_save=$prefix
1894418977 done
1894518978
1894618979 full_var=$new_full_var
18947 MAX_READLENGTH="$full_var"
18980 MAX_STACK_READLENGTH="$full_var"
1894818981
1894918982
1895018983 prefix=$prefix_save
1895118984 exec_prefix=$exec_prefix_save
1895218985
1895318986 fi
18954 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $MAX_READLENGTH" >&5
18955 $as_echo "$MAX_READLENGTH" >&6; }
18987 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $MAX_STACK_READLENGTH" >&5
18988 $as_echo "$MAX_STACK_READLENGTH" >&6; }
1895618989
1895718990
1895818991 # zlib package
1964119674 am__EXEEXT_FALSE=
1964219675 fi
1964319676
19677 if test -z "${MAINTAINER_MODE_TRUE}" && test -z "${MAINTAINER_MODE_FALSE}"; then
19678 as_fn_error $? "conditional \"MAINTAINER_MODE\" was never defined.
19679 Usually this means the macro was only invoked conditionally." "$LINENO" 5
19680 fi
1964419681 if test -z "${FULLDIST_TRUE}" && test -z "${FULLDIST_FALSE}"; then
1964519682 as_fn_error $? "conditional \"FULLDIST\" was never defined.
1964619683 Usually this means the macro was only invoked conditionally." "$LINENO" 5
2007120108 # report actual input values of CONFIG_FILES etc. instead of their
2007220109 # values after options handling.
2007320110 ac_log="
20074 This file was extended by gmap $as_me 2016-08-08, which was
20111 This file was extended by gmap $as_me 2016-08-16, which was
2007520112 generated by GNU Autoconf 2.69. Invocation command line was
2007620113
2007720114 CONFIG_FILES = $CONFIG_FILES
2013720174 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
2013820175 ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`"
2013920176 ac_cs_version="\\
20140 gmap config.status 2016-08-08
20177 gmap config.status 2016-08-16
2014120178 configured by $0, generated by GNU Autoconf 2.69,
2014220179 with options \\"\$ac_cs_config\\"
2014320180
110110 #AM_INIT_AUTOMAKE([no-dependencies])
111111 #AM_INIT_AUTOMAKE(AC_PACKAGE_NAME, AC_PACKAGE_VERSION)
112112 AM_INIT_AUTOMAKE
113 AM_MAINTAINER_MODE([disable])
114
113115
114116 AM_CONDITIONAL(FULLDIST,test "x$enable_fulldist" = xyes)
115117 AC_ARG_ENABLE([fulldist],
260262
261263 #AC_FUNC_MMAP # Checks only private fixed mapping of already-mapped memory
262264
265
263266 AC_MSG_CHECKING(whether alloca is enabled)
264267 AC_ARG_ENABLE([alloca],
265268 AC_HELP_STRING([--enable-alloca],
592595 AC_MSG_RESULT($GMAPDB)
593596
594597
595 # MAX_READLENGTH
596 AC_MSG_CHECKING(MAX_READLENGTH)
597 AC_ARG_VAR([MAX_READLENGTH], [Maximum read length for GSNAP (default 300)])
598 if test x"$MAX_READLENGTH" = x; then
599 ACX_EXPAND(MAX_READLENGTH,'300')
600 fi
601 AC_MSG_RESULT($MAX_READLENGTH)
598 # MAX_STACK_READLENGTH
599 AC_MSG_CHECKING(MAX_STACK_READLENGTH)
600 AC_ARG_VAR([MAX_STACK_READLENGTH], [Maximum read length for GSNAP allocating on stack rather than heap (default 300)])
601 if test x"$MAX_STACK_READLENGTH" = x; then
602 ACX_EXPAND(MAX_STACK_READLENGTH,'300')
603 fi
604 AC_MSG_RESULT($MAX_STACK_READLENGTH)
602605
603606
604607 # zlib package
(New empty file)
282282 # Previously included -lrt for shm_open, but we are not calling that
283283
284284 gsnap_nosimd_CC = $(PTHREAD_CC)
285 gsnap_nosimd_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS)
285 gsnap_nosimd_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS)
286286 gsnap_nosimd_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
287287 gsnap_nosimd_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
288288 dist_gsnap_nosimd_SOURCES = $(GSNAP_FILES)
289289
290290 gsnap_sse2_CC = $(PTHREAD_CC)
291 gsnap_sse2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 $(SIMD_SSE2_CFLAGS)
291 gsnap_sse2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 $(SIMD_SSE2_CFLAGS)
292292 gsnap_sse2_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
293293 gsnap_sse2_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
294294 dist_gsnap_sse2_SOURCES = $(GSNAP_FILES)
295295
296296 gsnap_ssse3_CC = $(PTHREAD_CC)
297 gsnap_ssse3_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 $(SIMD_SSSE3_CFLAGS)
297 gsnap_ssse3_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 $(SIMD_SSSE3_CFLAGS)
298298 gsnap_ssse3_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
299299 gsnap_ssse3_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
300300 dist_gsnap_ssse3_SOURCES = $(GSNAP_FILES)
301301
302302 gsnap_sse41_CC = $(PTHREAD_CC)
303 gsnap_sse41_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 $(SIMD_SSE4_1_CFLAGS)
303 gsnap_sse41_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 $(SIMD_SSE4_1_CFLAGS)
304304 gsnap_sse41_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
305305 gsnap_sse41_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
306306 dist_gsnap_sse41_SOURCES = $(GSNAP_FILES)
307307
308308 gsnap_sse42_CC = $(PTHREAD_CC)
309 gsnap_sse42_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 $(SIMD_SSE4_2_CFLAGS)
309 gsnap_sse42_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 $(SIMD_SSE4_2_CFLAGS)
310310 gsnap_sse42_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
311311 gsnap_sse42_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
312312 dist_gsnap_sse42_SOURCES = $(GSNAP_FILES)
313313
314314 gsnap_avx2_CC = $(PTHREAD_CC)
315 gsnap_avx2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 -DHAVE_AVX2=1 $(SIMD_AVX2_CFLAGS)
315 gsnap_avx2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 -DHAVE_AVX2=1 $(SIMD_AVX2_CFLAGS)
316316 gsnap_avx2_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
317317 gsnap_avx2_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
318318 dist_gsnap_avx2_SOURCES = $(GSNAP_FILES)
361361 # Note: dist_ commands get read by bootstrap, and don't follow the flags
362362
363363 gsnapl_nosimd_CC = $(PTHREAD_CC)
364 gsnapl_nosimd_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS)
364 gsnapl_nosimd_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS)
365365 gsnapl_nosimd_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
366366 gsnapl_nosimd_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
367367 dist_gsnapl_nosimd_SOURCES = $(GSNAPL_FILES)
368368
369369 gsnapl_sse2_CC = $(PTHREAD_CC)
370 gsnapl_sse2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 $(SIMD_SSE2_CFLAGS)
370 gsnapl_sse2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 $(SIMD_SSE2_CFLAGS)
371371 gsnapl_sse2_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
372372 gsnapl_sse2_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
373373 dist_gsnapl_sse2_SOURCES = $(GSNAPL_FILES)
374374
375375 gsnapl_ssse3_CC = $(PTHREAD_CC)
376 gsnapl_ssse3_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 $(SIMD_SSSE3_CFLAGS)
376 gsnapl_ssse3_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 $(SIMD_SSSE3_CFLAGS)
377377 gsnapl_ssse3_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
378378 gsnapl_ssse3_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
379379 dist_gsnapl_ssse3_SOURCES = $(GSNAPL_FILES)
380380
381381 gsnapl_sse41_CC = $(PTHREAD_CC)
382 gsnapl_sse41_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 $(SIMD_SSE4_1_CFLAGS)
382 gsnapl_sse41_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 $(SIMD_SSE4_1_CFLAGS)
383383 gsnapl_sse41_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
384384 gsnapl_sse41_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
385385 dist_gsnapl_sse41_SOURCES = $(GSNAPL_FILES)
386386
387387 gsnapl_sse42_CC = $(PTHREAD_CC)
388 gsnapl_sse42_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 $(SIMD_SSE4_2_CFLAGS)
388 gsnapl_sse42_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 $(SIMD_SSE4_2_CFLAGS)
389389 gsnapl_sse42_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
390390 gsnapl_sse42_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
391391 dist_gsnapl_sse42_SOURCES = $(GSNAPL_FILES)
392392
393393 gsnapl_avx2_CC = $(PTHREAD_CC)
394 gsnapl_avx2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 -DHAVE_AVX2=1 $(SIMD_AVX2_CFLAGS)
394 gsnapl_avx2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 -DHAVE_AVX2=1 $(SIMD_AVX2_CFLAGS)
395395 gsnapl_avx2_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
396396 gsnapl_avx2_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
397397 dist_gsnapl_avx2_SOURCES = $(GSNAPL_FILES)
435435 getopt.c getopt1.c getopt.h uniqscan.c
436436
437437 uniqscan_CC = $(PTHREAD_CC)
438 uniqscan_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS)
438 uniqscan_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS)
439439 uniqscan_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
440440 uniqscan_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
441441
477477 getopt.c getopt1.c getopt.h uniqscan.c
478478
479479 uniqscanl_CC = $(PTHREAD_CC)
480 uniqscanl_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS)
480 uniqscanl_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS)
481481 uniqscanl_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
482482 uniqscanl_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
483483
701701 # intlistdef.h intlist.c intlist.h listdef.h list.c list.h \
702702 # univinterval.c univinterval.h interval.c interval.h \
703703 # uintlist.c uintlist.h \
704 # chrom.c chrom.h stopwatch.c stopwatch.h access.c access.h \
704 # chrom.c chrom.h stopwatch.c stopwatch.h semaphore.c semaphore.h access.c access.h \
705705 # iit-read-univ.c iit-read-univ.h iitdef.h iit-read.c iit-read.h \
706706 # filestring.c filestring.h \
707707 # md5.c md5.h complement.h bzip2.c bzip2.h sequence.c sequence.h \
708708 # genome.c genome.h \
709709 # genomicpos.c genomicpos.h \
710 # chrnum.c chrnum.h chrsubset.c chrsubset.h \
710 # chrnum.c chrnum.h \
711711 # maxent.c maxent.h \
712712 # branchpoint.c branchpoint.h \
713713 # parserange.c parserange.h datadir.c datadir.h getopt.c getopt1.c getopt.h splicing-score.c
21892189 ETAGS = etags
21902190 CTAGS = ctags
21912191 am__DIST_COMMON = $(srcdir)/Makefile.in $(srcdir)/config.h.in \
2192 $(top_srcdir)/config/depcomp
2192 $(top_srcdir)/config/depcomp ChangeLog compile
21932193 DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
21942194 ACLOCAL = @ACLOCAL@
21952195 ALLOCA = @ALLOCA@
22352235 LN_S = @LN_S@
22362236 LTLIBOBJS = @LTLIBOBJS@
22372237 LT_SYS_LIBRARY_PATH = @LT_SYS_LIBRARY_PATH@
2238 MAINT = @MAINT@
22382239 MAKEINFO = @MAKEINFO@
22392240 MANIFEST_TOOL = @MANIFEST_TOOL@
2240 MAX_READLENGTH = @MAX_READLENGTH@
2241 MAX_STACK_READLENGTH = @MAX_STACK_READLENGTH@
22412242 MKDIR_P = @MKDIR_P@
22422243 MPICC = @MPICC@
22432244 MPILIBS = @MPILIBS@
25472548 # Note: dist_ commands get read by bootstrap, and don't follow the flags
25482549 # Previously included -lrt for shm_open, but we are not calling that
25492550 gsnap_nosimd_CC = $(PTHREAD_CC)
2550 gsnap_nosimd_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS)
2551 gsnap_nosimd_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS)
25512552 gsnap_nosimd_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
25522553 gsnap_nosimd_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
25532554 dist_gsnap_nosimd_SOURCES = $(GSNAP_FILES)
25542555 gsnap_sse2_CC = $(PTHREAD_CC)
2555 gsnap_sse2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 $(SIMD_SSE2_CFLAGS)
2556 gsnap_sse2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 $(SIMD_SSE2_CFLAGS)
25562557 gsnap_sse2_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
25572558 gsnap_sse2_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
25582559 dist_gsnap_sse2_SOURCES = $(GSNAP_FILES)
25592560 gsnap_ssse3_CC = $(PTHREAD_CC)
2560 gsnap_ssse3_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 $(SIMD_SSSE3_CFLAGS)
2561 gsnap_ssse3_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 $(SIMD_SSSE3_CFLAGS)
25612562 gsnap_ssse3_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
25622563 gsnap_ssse3_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
25632564 dist_gsnap_ssse3_SOURCES = $(GSNAP_FILES)
25642565 gsnap_sse41_CC = $(PTHREAD_CC)
2565 gsnap_sse41_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 $(SIMD_SSE4_1_CFLAGS)
2566 gsnap_sse41_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 $(SIMD_SSE4_1_CFLAGS)
25662567 gsnap_sse41_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
25672568 gsnap_sse41_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
25682569 dist_gsnap_sse41_SOURCES = $(GSNAP_FILES)
25692570 gsnap_sse42_CC = $(PTHREAD_CC)
2570 gsnap_sse42_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 $(SIMD_SSE4_2_CFLAGS)
2571 gsnap_sse42_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 $(SIMD_SSE4_2_CFLAGS)
25712572 gsnap_sse42_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
25722573 gsnap_sse42_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
25732574 dist_gsnap_sse42_SOURCES = $(GSNAP_FILES)
25742575 gsnap_avx2_CC = $(PTHREAD_CC)
2575 gsnap_avx2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 -DHAVE_AVX2=1 $(SIMD_AVX2_CFLAGS)
2576 gsnap_avx2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 -DHAVE_AVX2=1 $(SIMD_AVX2_CFLAGS)
25762577 gsnap_avx2_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
25772578 gsnap_avx2_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
25782579 dist_gsnap_avx2_SOURCES = $(GSNAP_FILES)
26162617
26172618 # Note: dist_ commands get read by bootstrap, and don't follow the flags
26182619 gsnapl_nosimd_CC = $(PTHREAD_CC)
2619 gsnapl_nosimd_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS)
2620 gsnapl_nosimd_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS)
26202621 gsnapl_nosimd_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
26212622 gsnapl_nosimd_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
26222623 dist_gsnapl_nosimd_SOURCES = $(GSNAPL_FILES)
26232624 gsnapl_sse2_CC = $(PTHREAD_CC)
2624 gsnapl_sse2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 $(SIMD_SSE2_CFLAGS)
2625 gsnapl_sse2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 $(SIMD_SSE2_CFLAGS)
26252626 gsnapl_sse2_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
26262627 gsnapl_sse2_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
26272628 dist_gsnapl_sse2_SOURCES = $(GSNAPL_FILES)
26282629 gsnapl_ssse3_CC = $(PTHREAD_CC)
2629 gsnapl_ssse3_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 $(SIMD_SSSE3_CFLAGS)
2630 gsnapl_ssse3_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 $(SIMD_SSSE3_CFLAGS)
26302631 gsnapl_ssse3_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
26312632 gsnapl_ssse3_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
26322633 dist_gsnapl_ssse3_SOURCES = $(GSNAPL_FILES)
26332634 gsnapl_sse41_CC = $(PTHREAD_CC)
2634 gsnapl_sse41_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 $(SIMD_SSE4_1_CFLAGS)
2635 gsnapl_sse41_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 $(SIMD_SSE4_1_CFLAGS)
26352636 gsnapl_sse41_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
26362637 gsnapl_sse41_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
26372638 dist_gsnapl_sse41_SOURCES = $(GSNAPL_FILES)
26382639 gsnapl_sse42_CC = $(PTHREAD_CC)
2639 gsnapl_sse42_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 $(SIMD_SSE4_2_CFLAGS)
2640 gsnapl_sse42_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 $(SIMD_SSE4_2_CFLAGS)
26402641 gsnapl_sse42_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
26412642 gsnapl_sse42_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
26422643 dist_gsnapl_sse42_SOURCES = $(GSNAPL_FILES)
26432644 gsnapl_avx2_CC = $(PTHREAD_CC)
2644 gsnapl_avx2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 -DHAVE_AVX2=1 $(SIMD_AVX2_CFLAGS)
2645 gsnapl_avx2_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS) -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 -DHAVE_AVX2=1 $(SIMD_AVX2_CFLAGS)
26452646 gsnapl_avx2_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
26462647 gsnapl_avx2_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
26472648 dist_gsnapl_avx2_SOURCES = $(GSNAPL_FILES)
26842685 getopt.c getopt1.c getopt.h uniqscan.c
26852686
26862687 uniqscan_CC = $(PTHREAD_CC)
2687 uniqscan_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS)
2688 uniqscan_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 $(POPCNT_CFLAGS)
26882689 uniqscan_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
26892690 uniqscan_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
26902691 dist_uniqscan_SOURCES = $(UNIQSCAN_FILES)
27232724 getopt.c getopt1.c getopt.h uniqscan.c
27242725
27252726 uniqscanl_CC = $(PTHREAD_CC)
2726 uniqscanl_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_READLENGTH=$(MAX_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS)
2727 uniqscanl_CFLAGS = $(AM_CFLAGS) $(PTHREAD_CFLAGS) -DTARGET=\"$(target)\" -DGMAPDB=\"$(GMAPDB)\" -DMAX_STACK_READLENGTH=$(MAX_STACK_READLENGTH) -DGSNAP=1 -DLARGE_GENOMES=1 $(POPCNT_CFLAGS)
27272728 uniqscanl_LDFLAGS = $(AM_LDFLAGS) $(STATIC_LDFLAG)
27282729 uniqscanl_LDADD = $(PTHREAD_LIBS) $(ZLIB_LIBS) $(BZLIB_LIBS)
27292730 dist_uniqscanl_SOURCES = $(UNIQSCANL_FILES)
29272928
29282929 .SUFFIXES:
29292930 .SUFFIXES: .c .lo .o .obj
2930 $(srcdir)/Makefile.in: $(srcdir)/Makefile.am $(am__configure_deps)
2931 $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps)
29312932 @for dep in $?; do \
29322933 case '$(am__configure_deps)' in \
29332934 *$$dep*) \
29512952 $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES)
29522953 cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
29532954
2954 $(top_srcdir)/configure: $(am__configure_deps)
2955 $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
29552956 cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
2956 $(ACLOCAL_M4): $(am__aclocal_m4_deps)
2957 $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
29572958 cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
29582959 $(am__aclocal_m4_deps):
29592960
29642965 stamp-h1: $(srcdir)/config.h.in $(top_builddir)/config.status
29652966 @rm -f stamp-h1
29662967 cd $(top_builddir) && $(SHELL) ./config.status src/config.h
2967 $(srcdir)/config.h.in: $(am__configure_deps)
2968 $(srcdir)/config.h.in: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
29682969 ($(am__cd) $(top_srcdir) && $(AUTOHEADER))
29692970 rm -f stamp-h1
29702971 touch $@
4148541486 # intlistdef.h intlist.c intlist.h listdef.h list.c list.h \
4148641487 # univinterval.c univinterval.h interval.c interval.h \
4148741488 # uintlist.c uintlist.h \
41488 # chrom.c chrom.h stopwatch.c stopwatch.h access.c access.h \
41489 # chrom.c chrom.h stopwatch.c stopwatch.h semaphore.c semaphore.h access.c access.h \
4148941490 # iit-read-univ.c iit-read-univ.h iitdef.h iit-read.c iit-read.h \
4149041491 # filestring.c filestring.h \
4149141492 # md5.c md5.h complement.h bzip2.c bzip2.h sequence.c sequence.h \
4149241493 # genome.c genome.h \
4149341494 # genomicpos.c genomicpos.h \
41494 # chrnum.c chrnum.h chrsubset.c chrsubset.h \
41495 # chrnum.c chrnum.h \
4149541496 # maxent.c maxent.h \
4149641497 # branchpoint.c branchpoint.h \
4149741498 # parserange.c parserange.h datadir.c datadir.h getopt.c getopt1.c getopt.h splicing-score.c
0 static char rcsid[] = "$Id: atoi.c 195988 2016-08-08 19:29:00Z twu $";
0 static char rcsid[] = "$Id: atoi.c 195989 2016-08-08 21:42:24Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
0 static char rcsid[] = "$Id: bytecoding.c 179281 2015-11-20 00:10:35Z twu $";
0 static char rcsid[] = "$Id: bytecoding.c 196402 2016-08-16 14:29:06Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
652652 /* return exceptions[highi + 1]; */
653653
654654 fprintf(stderr,"Bytecoding_read should have found index %u as an exception, but failed\n",key);
655 fprintf(stderr,"One possible cause is a corrupted shared memory segment, if GSNAP has exited abnormally\n");
656 fprintf(stderr,"Please do 'ipcs -m' and then 'ipcrm -m' on each of those segments\n");
655657 abort();
656658 }
657659 }
711713 /* return exceptions[highi + 1]; */
712714
713715 fprintf(stderr,"Bytecoding_read_wguide should have found index %u as an exception, but failed\n",key);
716 fprintf(stderr,"One possible cause is a corrupted shared memory segment, if GSNAP has exited abnormally\n");
717 fprintf(stderr,"Please do 'ipcs -m' and then 'ipcrm -m' on each of those segments\n");
714718 abort();
715719 }
716720 }
765769 /* return exceptions[highi + 1]; */
766770
767771 fprintf(stderr,"Bytecoding_lcp should have found index %u as an exception, but failed\n",key);
772 fprintf(stderr,"One possible cause is a corrupted shared memory segment, if GSNAP has exited abnormally\n");
773 fprintf(stderr,"Please do 'ipcs -m' and then 'ipcrm -m' on each of those segments\n");
768774 abort();
769775 }
770776 }
846852 /* return exceptions[highi + 1]; */
847853
848854 fprintf(stderr,"Bytecoding_lcpchilddc_child_up should have found index %u as an exception, but failed\n",key);
855 fprintf(stderr,"One possible cause is a corrupted shared memory segment, if GSNAP has exited abnormally\n");
856 fprintf(stderr,"Please do 'ipcs -m' and then 'ipcrm -m' on each of those segments\n");
849857 abort();
850858 }
851859 }
907915 /* return exceptions[highi + 1]; */
908916
909917 fprintf(stderr,"Bytecoding_lcpchilddc_child_next should have found index %u as an exception, but failed\n",key);
918 fprintf(stderr,"One possible cause is a corrupted shared memory segment, if GSNAP has exited abnormally\n");
919 fprintf(stderr,"Please do 'ipcs -m' and then 'ipcrm -m' on each of those segments\n");
910920 abort();
911921 }
912922 }
974984 /* return exceptions[highi + 1]; */
975985
976986 fprintf(stderr,"Bytecoding_lcpchilddc_lcp_next should have found index %u as an exception, but failed\n",key);
987 fprintf(stderr,"One possible cause is a corrupted shared memory segment, if GSNAP has exited abnormally\n");
988 fprintf(stderr,"Please do 'ipcs -m' and then 'ipcrm -m' on each of those segments\n");
977989 abort();
978990 }
979991 }
10451057 /* return exceptions[highi + 1]; */
10461058
10471059 fprintf(stderr,"Bytecoding_lcpchilddcn_child_up should have found index %u as an exception, but failed\n",key);
1060 fprintf(stderr,"One possible cause is a corrupted shared memory segment, if GSNAP has exited abnormally\n");
1061 fprintf(stderr,"Please do 'ipcs -m' and then 'ipcrm -m' on each of those segments\n");
10481062 abort();
10491063 }
10501064 }
11171131 /* return exceptions[highi + 1]; */
11181132
11191133 fprintf(stderr,"Bytecoding_lcpchilddcn_child_next should have found index %u as an exception, but failed\n",key);
1134 fprintf(stderr,"One possible cause is a corrupted shared memory segment, if GSNAP has exited abnormally\n");
1135 fprintf(stderr,"Please do 'ipcs -m' and then 'ipcrm -m' on each of those segments\n");
11201136 abort();
11211137 }
11221138 }
0 static char rcsid[] = "$Id: cmet.c 195988 2016-08-08 19:29:00Z twu $";
0 static char rcsid[] = "$Id: cmet.c 195989 2016-08-08 21:42:24Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
0 /* $Id: comp.h 195763 2016-08-04 01:37:20Z twu $ */
0 /* $Id: comp.h 195548 2016-08-02 17:18:50Z twu $ */
11 #ifndef COMP_INCLUDED
22 #define COMP_INCLUDED
33
0 -*- mode: compilation; default-directory: "~/bioinfo/gmap/trunk/src/" -*-
1 Compilation started at Mon Dec 14 14:13:20
2
3 make -k gsnap.sse42
4 /gne/home/twu/bin/gcc -DHAVE_CONFIG_H -I. -pthread -DTARGET=\"x86_64-unknown-linux-gnu\" -DGMAPDB=\"/gne/research/data/bioinfo/gmap/data/genomes\" -DMAX_READLENGTH=300 -DGSNAP=1 -DHAVE_SSE2=1 -DHAVE_SSSE3=1 -DHAVE_SSE4_1=1 -DHAVE_SSE4_2=1 -msse2 -mssse3 -msse4.1 -msse4.2 -mpopcnt -g -Wall -Wextra -DCHECK_ASSERTIONS=1 -MT gsnap_sse42-dynprog_simd.o -MD -MP -MF .deps/gsnap_sse42-dynprog_simd.Tpo -c -o gsnap_sse42-dynprog_simd.o `test -f 'dynprog_simd.c' || echo './'`dynprog_simd.c
5 dynprog_simd.c: In function ‘Dynprog_simd_8’:
6 dynprog_simd.c:2143:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
7 na2 = revp ? nt_to_int_array[gsequence[1-c]] : nt_to_int_array[gsequence[c-1]];
8 ^
9 dynprog_simd.c:2143:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
10 dynprog_simd.c:2144:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
11 na2_alt = revp ? nt_to_int_array[gsequence_alt[1-c]] : nt_to_int_array[gsequence_alt[c-1
12 ^
13 dynprog_simd.c:2144:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
14 dynprog_simd.c:2347:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
15 na2 = revp ? nt_to_int_array[gsequence[1-c]] : nt_to_int_array[gsequence[c-1]];
16 ^
17 dynprog_simd.c:2347:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
18 dynprog_simd.c:2348:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
19 na2_alt = revp ? nt_to_int_array[gsequence_alt[1-c]] : nt_to_int_array[gsequence_alt[c-1
20 ^
21 dynprog_simd.c:2348:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
22 dynprog_simd.c:1942:33: warning: variable ‘extend_ladder’ set but not used [-Wunused-but-set-variable]
23 __m128i gap_open, gap_extend, extend_ladder, complement_dummy;
24 ^
25 dynprog_simd.c: In function ‘Dynprog_simd_8_upper’:
26 dynprog_simd.c:2770:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
27 na2 = revp ? nt_to_int_array[gsequence[1-c]] : nt_to_int_array[gsequence[c-1]];
28 ^
29 dynprog_simd.c:2770:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
30 dynprog_simd.c:2771:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
31 na2_alt = revp ? nt_to_int_array[gsequence_alt[1-c]] : nt_to_int_array[gsequence_alt[c-1
32 ^
33 dynprog_simd.c:2771:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
34 dynprog_simd.c:2896:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
35 na2 = revp ? nt_to_int_array[gsequence[1-c]] : nt_to_int_array[gsequence[c-1]];
36 ^
37 dynprog_simd.c:2896:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
38 dynprog_simd.c:2897:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
39 na2_alt = revp ? nt_to_int_array[gsequence_alt[1-c]] : nt_to_int_array[gsequence_alt[c-1
40 ^
41 dynprog_simd.c:2897:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
42 dynprog_simd.c:2632:8: warning: unused variable ‘na2_single’ [-Wunused-variable]
43 char na2_single;
44 ^
45 dynprog_simd.c:2626:70: warning: unused variable ‘pairscore’ [-Wunused-variable]
46 Score8_T *pairscores[5], *pairscores_std_ptr, *pairscores_alt_ptr, pairscore;
47 ^
48 dynprog_simd.c: In function ‘Dynprog_simd_8_lower’:
49 dynprog_simd.c:3238:3: error: ‘extend_ladder’ undeclared (first use in this function)
50 extend_ladder = _mm_setr_epi8(0,extend,2*extend,3*extend,4*extend,5*extend,6*extend,7*ext
51 ^
52 dynprog_simd.c:3238:3: note: each undeclared identifier is reported only once for each function it appears in
53 dynprog_simd.c:3267:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
54 na1 = revp ? nt_to_int_array[rsequence[1-r]] : nt_to_int_array[rsequence[r-1]];
55 ^
56 dynprog_simd.c:3267:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
57 dynprog_simd.c:3389:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
58 na1 = revp ? nt_to_int_array[rsequence[1-r]] : nt_to_int_array[rsequence[r-1]];
59 ^
60 dynprog_simd.c:3389:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
61 dynprog_simd.c:3089:8: warning: unused variable ‘na2_single’ [-Wunused-variable]
62 char na2_single;
63 ^
64 dynprog_simd.c:3083:45: warning: unused variable ‘pairscore’ [-Wunused-variable]
65 Score8_T *pairscores[5], *pairscores_ptr, pairscore;
66 ^
67 dynprog_simd.c: In function ‘Dynprog_simd_16’:
68 dynprog_simd.c:3739:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
69 na2 = revp ? nt_to_int_array[gsequence[1-c]] : nt_to_int_array[gsequence[c-1]];
70 ^
71 dynprog_simd.c:3739:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
72 dynprog_simd.c:3740:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
73 na2_alt = revp ? nt_to_int_array[gsequence_alt[1-c]] : nt_to_int_array[gsequence_alt[c-1
74 ^
75 dynprog_simd.c:3740:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
76 dynprog_simd.c:3923:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
77 na2 = revp ? nt_to_int_array[gsequence[1-c]] : nt_to_int_array[gsequence[c-1]];
78 ^
79 dynprog_simd.c:3923:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
80 dynprog_simd.c:3924:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
81 na2_alt = revp ? nt_to_int_array[gsequence_alt[1-c]] : nt_to_int_array[gsequence_alt[c-1
82 ^
83 dynprog_simd.c:3924:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
84 dynprog_simd.c:3563:33: warning: variable ‘extend_ladder’ set but not used [-Wunused-but-set-variable]
85 __m128i gap_open, gap_extend, extend_ladder, complement_dummy;
86 ^
87 dynprog_simd.c: In function ‘Dynprog_simd_16_upper’:
88 dynprog_simd.c:4259:3: error: ‘extend_ladder’ undeclared (first use in this function)
89 extend_ladder = _mm_setr_epi16(0,extend,2*extend,3*extend,4*extend,5*extend,6*extend,7*ex
90 ^
91 dynprog_simd.c:4284:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
92 na2 = revp ? nt_to_int_array[gsequence[1-c]] : nt_to_int_array[gsequence[c-1]];
93 ^
94 dynprog_simd.c:4284:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
95 dynprog_simd.c:4285:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
96 na2_alt = revp ? nt_to_int_array[gsequence_alt[1-c]] : nt_to_int_array[gsequence_alt[c-1
97 ^
98 dynprog_simd.c:4285:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
99 dynprog_simd.c:4381:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
100 na2 = revp ? nt_to_int_array[gsequence[1-c]] : nt_to_int_array[gsequence[c-1]];
101 ^
102 dynprog_simd.c:4381:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
103 dynprog_simd.c:4382:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
104 na2_alt = revp ? nt_to_int_array[gsequence_alt[1-c]] : nt_to_int_array[gsequence_alt[c-1
105 ^
106 dynprog_simd.c:4382:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
107 dynprog_simd.c:4158:8: warning: unused variable ‘na2_single’ [-Wunused-variable]
108 char na2_single;
109 ^
110 dynprog_simd.c:4152:71: warning: unused variable ‘pairscore’ [-Wunused-variable]
111 Score16_T *pairscores[5], *pairscores_std_ptr, *pairscores_alt_ptr, pairscore;
112 ^
113 dynprog_simd.c: In function ‘Dynprog_simd_16_lower’:
114 dynprog_simd.c:4675:3: error: ‘extend_ladder’ undeclared (first use in this function)
115 extend_ladder = _mm_setr_epi16(0,extend,2*extend,3*extend,4*extend,5*extend,6*extend,7*ex
116 ^
117 dynprog_simd.c:4699:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
118 na1 = revp ? nt_to_int_array[rsequence[1-r]] : nt_to_int_array[rsequence[r-1]];
119 ^
120 dynprog_simd.c:4699:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
121 dynprog_simd.c:4792:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
122 na1 = revp ? nt_to_int_array[rsequence[1-r]] : nt_to_int_array[rsequence[r-1]];
123 ^
124 dynprog_simd.c:4792:4: warning: array subscript has type ‘char’ [-Wchar-subscripts]
125 dynprog_simd.c:4542:8: warning: unused variable ‘na2_single’ [-Wunused-variable]
126 char na2_single;
127 ^
128 dynprog_simd.c:4536:46: warning: unused variable ‘pairscore’ [-Wunused-variable]
129 Score16_T *pairscores[5], *pairscores_ptr, pairscore;
130 ^
131 dynprog_simd.c: In function ‘Dynprog_traceback_8_lower’:
132 dynprog_simd.c:5278:8: warning: unused variable ‘add_dashes_p’ [-Wunused-variable]
133 bool add_dashes_p;
134 ^
135 dynprog_simd.c:5275:11: warning: unused parameter ‘cdna_direction’ [-Wunused-parameter]
136 int cdna_direction, bool watsonp, int dynprogindex) {
137 ^
138 dynprog_simd.c: In function ‘Dynprog_traceback_16_lower’:
139 dynprog_simd.c:5662:8: warning: unused variable ‘add_dashes_p’ [-Wunused-variable]
140 bool add_dashes_p;
141 ^
142 dynprog_simd.c:5659:12: warning: unused parameter ‘cdna_direction’ [-Wunused-parameter]
143 int cdna_direction, bool watsonp, int dynprogindex) {
144 ^
145 dynprog_simd.c: At top level:
146 dynprog_simd.c:1:13: warning: ‘rcsid’ defined but not used [-Wunused-variable]
147 static char rcsid[] = "$Id: dynprog_simd.c 146623 2014-09-02 21:31:32Z twu $";
148 ^
149 dynprog_simd.c:510:1: warning: ‘Directions8_print’ defined but not used [-Wunused-function]
150 Directions8_print (Direction8_T **directions_nogap, Direction8_T **directions_Egap, Directi
151 ^
152 dynprog_simd.c:604:1: warning: ‘Directions8_print_ud’ defined but not used [-Wunused-function]
153 Directions8_print_ud (Direction8_T **directions_nogap, Direction8_T **directions_Egap,
154 ^
155 dynprog_simd.c:713:1: warning: ‘Directions16_print’ defined but not used [-Wunused-function]
156 Directions16_print (Direction16_T **directions_nogap, Direction16_T **directions_Egap, Dire
157 ^
158 dynprog_simd.c:807:1: warning: ‘Directions16_print_ud’ defined but not used [-Wunused-function]
159 Directions16_print_ud (Direction16_T **directions_nogap, Direction16_T **directions_Egap,
160 ^
161 make: *** [gsnap_sse42-dynprog_simd.o] Error 1
162 make: Target `gsnap.sse42' not remade because of errors.
163
164 Compilation exited abnormally with code 2 at Mon Dec 14 14:13:23
0 static char rcsid[] = "$Id: filestring.c 195969 2016-08-08 17:01:27Z twu $";
0 static char rcsid[] = "$Id: filestring.c 196273 2016-08-12 15:15:06Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
0 static char rcsid[] = "$Id: genome_sites.c 195749 2016-08-03 23:35:09Z twu $";
0 static char rcsid[] = "$Id: genome_sites.c 196273 2016-08-12 15:15:06Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
0 static char rcsid[] = "$Id: gmap.c 193877 2016-07-12 02:46:33Z twu $";
0 static char rcsid[] = "$Id: gmap.c 196403 2016-08-16 14:33:56Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
381381
382382 /* GFF3 */
383383 static bool gff3_separators_p = true;
384 static bool gff3_phase_swap_p = false;
384385
385386 /* SAM */
386387 /* Applicable to PMAP? */
556557 {"require-splicedir", no_argument, 0, 0}, /* require_splicedir_p */
557558
558559 {"gff3-add-separators", required_argument, 0, 0}, /* gff3_separators_p */
560 {"gff3-swap-phase", required_argument, 0, 0}, /* gff3_phase_swap_p */
559561
560562 #ifndef PMAP
561563 {"quality-protocol", required_argument, 0, 0}, /* quality_shift */
52975299 split_output_root = optarg;
52985300 } else if (!strcmp(long_name,"append-output")) {
52995301 appendp = true;
5302
53005303 } else if (!strcmp(long_name,"gff3-add-separators")) {
53015304 if (!strcmp(optarg,"1")) {
53025305 gff3_separators_p = true;
53045307 gff3_separators_p = false;
53055308 } else {
53065309 fprintf(stderr,"--gff3-add-separators flag must be 0 or 1\n");
5310 return 9;
5311 }
5312
5313 } else if (!strcmp(long_name,"gff3-swap-phase")) {
5314 if (!strcmp(optarg,"1")) {
5315 gff3_phase_swap_p = true;
5316 } else if (!strcmp(optarg,"0")) {
5317 gff3_phase_swap_p = false;
5318 } else {
5319 fprintf(stderr,"--gff3-swap-phase flag must be 0 or 1\n");
53075320 return 9;
53085321 }
53095322
65746587 Pair_setup(trim_mismatch_score,trim_indel_score,gff3_separators_p,sam_insert_0M_p,
65756588 force_xs_direction_p,md_lowercase_variant_p,
65766589 /*snps_p*/genomecomp_alt ? true : false,
6577 /*print_nsnpdiffs_p*/genomecomp_alt ? true : false,genomelength);
6590 /*print_nsnpdiffs_p*/genomecomp_alt ? true : false,genomelength,
6591 gff3_phase_swap_p);
65786592 Stage3_setup(/*splicingp*/novelsplicingp == true || knownsplicingp == true,novelsplicingp,
65796593 require_splicedir_p,splicing_iit,splicing_divint_crosstable,
65806594 donor_typeint,acceptor_typeint,splicesites,altlocp,alias_starts,alias_ends,
71837197 fprintf(stdout,"\
71847198 --gff3-add-separators=INT Whether to add a ### separator after each query sequence\n\
71857199 Values: 0 (no), 1 (yes, default)\n\
7200 --gff3-swap-phase=INT Whether to swap phase (0 => 0, 1 => 2, 2 => 1) in gff3_gene format\n\
7201 Needed by some analysis programs, but deviates from GFF3 specification\n\
7202 Values: 0 (no, default), 1 (yes)\n\
71867203 ");
71877204 fprintf(stdout,"\n");
71887205
0 static char rcsid[] = "$Id: gsnap.c 195760 2016-08-04 00:12:04Z twu $";
0 static char rcsid[] = "$Id: gsnap.c 196438 2016-08-16 20:23:27Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
120120
121121 #define MIN_INDEXDB_SIZE_THRESHOLD 100
122122
123 #define MAX_FLOORS_READLENGTH 300
123124 #define MAX_QUERYLENGTH_FOR_ALLOC 100000
124125 #define MAX_GENOMICLENGTH_FOR_ALLOC 1000000
125126
751752 genomedir = Datadir_find_genomedir(/*user_genomedir*/NULL);
752753 fprintf(stdout,"Default gmap directory (environment): %s\n",genomedir);
753754 FREE(genomedir);
754 fprintf(stdout,"Maximum read length: %d\n",MAX_READLENGTH);
755 fprintf(stdout,"Maximum stack read length: %d\n",MAX_STACK_READLENGTH);
755756 fprintf(stdout,"Thomas D. Wu, Genentech, Inc.\n");
756757 fprintf(stdout,"Contact: twu@gene.com\n");
757758 fprintf(stdout,"\n");
10851086 cellpool = Cellpool_new();
10861087 worker_stopwatch = (timingp == true) ? Stopwatch_new() : (Stopwatch_T) NULL;
10871088
1088 floors_array = (Floors_T *) CALLOC(MAX_READLENGTH+1,sizeof(Floors_T));
1089 floors_array = (Floors_T *) CALLOC(MAX_FLOORS_READLENGTH+1,sizeof(Floors_T));
10891090
10901091 /* Except_stack_create(); -- requires pthreads */
10911092
11831184
11841185 /* Except_stack_destroy(); -- requires pthreads */
11851186
1186 for (i = 0; i <= MAX_READLENGTH; i++) {
1187 for (i = 0; i <= MAX_FLOORS_READLENGTH; i++) {
11871188 if (floors_array[i] != NULL) {
11881189 Floors_free_keep(&(floors_array[i]));
11891190 }
12591260 cellpool = Cellpool_new();
12601261 worker_stopwatch = (timingp == true) ? Stopwatch_new() : (Stopwatch_T) NULL;
12611262
1262 floors_array = (Floors_T *) CALLOC(MAX_READLENGTH+1,sizeof(Floors_T));
1263 floors_array = (Floors_T *) CALLOC(MAX_FLOORS_READLENGTH+1,sizeof(Floors_T));
12631264
12641265 Except_stack_create();
12651266
13591360
13601361 Except_stack_destroy();
13611362
1362 for (i = 0; i <= MAX_READLENGTH; i++) {
1363 for (i = 0; i <= MAX_FLOORS_READLENGTH; i++) {
13631364 if (floors_array[i] != NULL) {
13641365 Floors_free_keep(&(floors_array[i]));
13651366 }
32993300 Pair_setup(trim_mismatch_score,trim_indel_score,/*gff3_separators_p*/false,sam_insert_0M_p,
33003301 force_xs_direction_p,md_lowercase_variant_p,
33013302 /*snps_p*/snps_iit ? true : false,print_nsnpdiffs_p,
3302 Univ_IIT_genomelength(chromosome_iit,/*with_circular_alias*/false));
3303 Univ_IIT_genomelength(chromosome_iit,/*with_circular_alias*/false),
3304 /*gff3_phase_swap_p*/false);
33033305 Stage3_setup(/*splicingp*/novelsplicingp == true || knownsplicingp == true,novelsplicingp,
33043306 /*require_splicedir_p*/true,splicing_iit,splicing_divint_crosstable,
33053307 donor_typeint,acceptor_typeint,splicesites,altlocp,alias_starts,alias_ends,
33373339 nullgap,maxpeelback,maxpeelback_distalmedial,
33383340 extramaterial_end,extramaterial_paired,gmap_mode,
33393341 trigger_score_for_gmap,gmap_allowance,max_gmap_pairsearch,
3340 max_gmap_terminal,max_gmap_improvement,antistranded_penalty);
3342 max_gmap_terminal,max_gmap_improvement,antistranded_penalty,
3343 MAX_FLOORS_READLENGTH);
33413344 Substring_setup(print_nsnpdiffs_p,print_snplabels_p,
33423345 show_refdiff_p,snps_iit,snps_divint_crosstable,
33433346 genes_iit,genes_divint_crosstable,
0 static char rcsid[] = "$Id: indel.c 193229 2016-06-30 22:31:10Z twu $";
0 static char rcsid[] = "$Id: indel.c 196431 2016-08-16 20:19:22Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
4444 #endif
4545 int nmismatches_left, nmismatches_right;
4646 int best_sum, sum, nmismatches_lefti, nmismatches_righti, lefti, righti;
47
47 int *mismatch_positions_left, *mismatch_positions_right;
4848
4949 #ifdef HAVE_ALLOCA
50 int *mismatch_positions_left = (int *) ALLOCA(querylength * sizeof(int));
51 int *mismatch_positions_right = (int *) ALLOCA(querylength * sizeof(int));
52 #else
53 int mismatch_positions_left[MAX_READLENGTH], mismatch_positions_right[MAX_READLENGTH];
54
55 if (max_mismatches_allowed > MAX_READLENGTH) {
56 max_mismatches_allowed = MAX_READLENGTH;
57 }
58 #endif
59
50 if (querylength <= MAX_STACK_READLENGTH) {
51 mismatch_positions_left = (int *) ALLOCA(querylength * sizeof(int));
52 mismatch_positions_right = (int *) ALLOCA(querylength * sizeof(int));
53 } else {
54 mismatch_positions_left = (int *) MALLOC(querylength * sizeof(int));
55 mismatch_positions_right = (int *) MALLOC(querylength * sizeof(int));
56 }
57 #else
58 mismatch_positions_left = (int *) MALLOC(querylength * sizeof(int));
59 mismatch_positions_right = (int *) MALLOC(querylength * sizeof(int));
60 #endif
61
62 if (max_mismatches_allowed > querylength) {
63 max_mismatches_allowed = querylength;
64 }
6065
6166 /* query has insertion. Get |indels| less from genome; trim from left. */
6267 /* left = ptr->diagonal - querylength; */
167172 }
168173 debug2(printf("\n"));
169174
175 #ifdef HAVE_ALLOCA
176 if (querylength <= MAX_STACK_READLENGTH) {
177 FREEA(mismatch_positions_left);
178 FREEA(mismatch_positions_right);
179 } else {
180 FREE(mismatch_positions_left);
181 FREE(mismatch_positions_right);
182 }
183 #else
184 FREE(mismatch_positions_left);
185 FREE(mismatch_positions_right);
186 #endif
187
170188 *best_nmismatches_i = nmismatches_lefti;
171189 *best_nmismatches_j = nmismatches_righti;
172190
202220 #endif
203221 int nmismatches_left, nmismatches_right, nmismatches_lefti, nmismatches_righti;
204222 int best_sum, sum, lefti, righti;
223 int *mismatch_positions_left, *mismatch_positions_right;
205224
206225 #ifdef HAVE_ALLOCA
207 int *mismatch_positions_left = (int *) ALLOCA(querylength * sizeof(int));
208 int *mismatch_positions_right = (int *) ALLOCA(querylength * sizeof(int));
209 #else
210 int mismatch_positions_left[MAX_READLENGTH], mismatch_positions_right[MAX_READLENGTH];
211
212 if (max_mismatches_allowed > MAX_READLENGTH) {
213 max_mismatches_allowed = MAX_READLENGTH;
214 }
215 #endif
216
226 if (querylength <= MAX_STACK_READLENGTH) {
227 mismatch_positions_left = (int *) ALLOCA(querylength * sizeof(int));
228 mismatch_positions_right = (int *) ALLOCA(querylength * sizeof(int));
229 } else {
230 mismatch_positions_left = (int *) MALLOC(querylength * sizeof(int));
231 mismatch_positions_right = (int *) MALLOC(querylength * sizeof(int));
232 }
233 #else
234 mismatch_positions_left = (int *) MALLOC(querylength * sizeof(int));
235 mismatch_positions_right = (int *) MALLOC(querylength * sizeof(int));
236 #endif
237
238 if (max_mismatches_allowed > querylength) {
239 max_mismatches_allowed = querylength;
240 }
217241
218242 /* query has deletion. Get |indels| more from genome; add to right. */
219243 /* left = ptr->diagonal - querylength; */
318342 }
319343 debug2(printf("\n"));
320344
345 #ifdef HAVE_ALLOCA
346 if (querylength <= MAX_STACK_READLENGTH) {
347 FREEA(mismatch_positions_left);
348 FREEA(mismatch_positions_right);
349 } else {
350 FREE(mismatch_positions_left);
351 FREE(mismatch_positions_right);
352 }
353 #else
354 FREE(mismatch_positions_left);
355 FREE(mismatch_positions_right);
356 #endif
357
321358 *best_nmismatches_i = nmismatches_lefti;
322359 *best_nmismatches_j = nmismatches_righti;
323360
356393 int nmismatches_left, nmismatches_right;
357394 int best_sum, sum, nmismatches_lefti, nmismatches_righti, lefti, righti;
358395 int nmismatches1, nmismatches2;
396 int *mismatch_positions_left, *mismatch_positions_right;
359397
360398 #ifdef HAVE_ALLOCA
361 int *mismatch_positions_left = (int *) ALLOCA(querylength * sizeof(int));
362 int *mismatch_positions_right = (int *) ALLOCA(querylength * sizeof(int));
363 #else
364 int mismatch_positions_left[MAX_READLENGTH], mismatch_positions_right[MAX_READLENGTH];
399 if (querylength <= MAX_STACK_READLENGTH) {
400 mismatch_positions_left = (int *) ALLOCA(querylength * sizeof(int));
401 mismatch_positions_right = (int *) ALLOCA(querylength * sizeof(int));
402 } else {
403 mismatch_positions_left = (int *) MALLOC(querylength * sizeof(int));
404 mismatch_positions_right = (int *) MALLOC(querylength * sizeof(int));
405 }
406 #else
407 mismatch_positions_left = (int *) MALLOC(querylength * sizeof(int));
408 mismatch_positions_right = (int *) MALLOC(querylength * sizeof(int));
365409 #endif
366410
367411
474518 }
475519 debug2(printf("\n"));
476520
521 #ifdef HAVE_ALLOCA
522 if (querylength <= MAX_STACK_READLENGTH) {
523 FREEA(mismatch_positions_left);
524 FREEA(mismatch_positions_right);
525 } else {
526 FREE(mismatch_positions_left);
527 FREE(mismatch_positions_right);
528 }
529 #else
530 FREE(mismatch_positions_left);
531 FREE(mismatch_positions_right);
532 #endif
533
477534 if (best_sum <= max_mismatches_allowed) {
478535 if (plusp == true) {
479536 query_indel_pos = best_indel_pos;
522579 int nmismatches_left, nmismatches_right;
523580 int best_sum, sum, nmismatches_lefti, nmismatches_righti, lefti, righti;
524581 int nmismatches1, nmismatches2;
582 int *mismatch_positions_left, *mismatch_positions_right;
525583
526584 #ifdef HAVE_ALLOCA
527 int *mismatch_positions_left = (int *) ALLOCA(querylength * sizeof(int));
528 int *mismatch_positions_right = (int *) ALLOCA(querylength * sizeof(int));
529 #else
530 int mismatch_positions_left[MAX_READLENGTH], mismatch_positions_right[MAX_READLENGTH];
585 if (querylength <= MAX_STACK_READLENGTH) {
586 mismatch_positions_left = (int *) ALLOCA(querylength * sizeof(int));
587 mismatch_positions_right = (int *) ALLOCA(querylength * sizeof(int));
588 } else {
589 mismatch_positions_left = (int *) MALLOC(querylength * sizeof(int));
590 mismatch_positions_right = (int *) MALLOC(querylength * sizeof(int));
591 }
592 #else
593 mismatch_positions_left = (int *) MALLOC(querylength * sizeof(int));
594 mismatch_positions_right = (int *) MALLOC(querylength * sizeof(int));
531595 #endif
532596
533597
636700 }
637701 debug2(printf("\n"));
638702
703 #ifdef HAVE_ALLOCA
704 if (querylength <= MAX_STACK_READLENGTH) {
705 FREEA(mismatch_positions_left);
706 FREEA(mismatch_positions_right);
707 } else {
708 FREE(mismatch_positions_left);
709 FREE(mismatch_positions_right);
710 }
711 #else
712 FREE(mismatch_positions_left);
713 FREE(mismatch_positions_right);
714 #endif
639715
640716 if (best_sum <= max_mismatches_allowed) {
641717 if (plusp == true) {
0 static char rcsid[] = "$Id: mapq.c 184376 2016-02-16 23:39:30Z twu $";
0 static char rcsid[] = "$Id: mapq.c 196431 2016-08-16 20:19:22Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
159159
160160 int nmismatches, i;
161161 int alignlength;
162 int *mismatch_positions;
162163
163164 #ifdef HAVE_ALLOCA
164 int *mismatch_positions = (int *) ALLOCA((querylength+1) * sizeof(int));
165 #else
166 int mismatch_positions[MAX_READLENGTH+1];
165 if (querylength <= MAX_STACK_READLENGTH) {
166 mismatch_positions = (int *) ALLOCA((querylength+1) * sizeof(int));
167 } else {
168 mismatch_positions = (int *) MALLOC((querylength+1) * sizeof(int));
169 }
170 #else
171 mismatch_positions = (int *) MALLOC((querylength+1) * sizeof(int));
167172 #endif
168173
169174
253258
254259 }
255260
261 #ifdef HAVE_ALLOCA
262 if (querylength <= MAX_STACK_READLENGTH) {
263 FREEA(mismatch_positions);
264 } else {
265 FREE(mismatch_positions);
266 }
267 #else
268 FREE(mismatch_positions);
269 #endif
270
256271 debug(printf("returning loglik %f\n",loglik));
257272 return loglik;
258273 }
0 static char rcsid[] = "$Id: pair.c 195763 2016-08-04 01:37:20Z twu $";
0 static char rcsid[] = "$Id: pair.c 196403 2016-08-16 14:33:56Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
148148 static bool print_nsnpdiffs_p;
149149 static double genomelength; /* For BLAST E-value */
150150
151 static bool gff3_phase_swap_p = true;
152
151153
152154 void
153155 Pair_setup (int trim_mismatch_score_in, int trim_indel_score_in,
154156 bool gff3_separators_p_in, bool sam_insert_0M_p_in, bool force_xs_direction_p_in,
155157 bool md_lowercase_variant_p_in, bool snps_p_in, bool print_nsnpdiffs_p_in,
156 Univcoord_T genomelength_in) {
158 Univcoord_T genomelength_in, bool gff3_phase_swap_p_in) {
157159 trim_mismatch_score = trim_mismatch_score_in;
158160 trim_indel_score = trim_indel_score_in;
159161 gff3_separators_p = gff3_separators_p_in;
163165 snps_p = snps_p_in;
164166 print_nsnpdiffs_p = print_nsnpdiffs_p_in;
165167 genomelength = (double) genomelength_in;
168 gff3_phase_swap_p = gff3_phase_swap_p_in;
166169
167170 return;
168171 }
24482451 }
24492452 }
24502453
2451 FPRINTF(fp,"%d\t",cds_phase); /* 8: phase */
2454 if (gff3_phase_swap_p == true && cds_phase > 0) {
2455 /* Some analysis programs want phase in gff3 to be different */
2456 FPRINTF(fp,"%d\t",3 - cds_phase); /* 8: phase */
2457 } else {
2458 /* This appears to be the specification: a phase of 0 indicates
2459 that the next codon begins at the first base of the region
2460 described by the current line, a phase of 1 indicates that the
2461 next codon begins at the second base of this region, and a
2462 phase of 2 indicates that the codon begins at the third base of
2463 this region. */
2464 FPRINTF(fp,"%d\t",cds_phase); /* 8: phase */
2465 }
24522466
24532467 /* 9: features */
24542468 FPRINTF(fp,"ID=%s.mrna%d.cds%d;",accession,pathnum,cdsno);
0 /* $Id: pair.h 193230 2016-06-30 22:32:37Z twu $ */
0 /* $Id: pair.h 196403 2016-08-16 14:33:56Z twu $ */
11 #ifndef PAIR_INCLUDED
22 #define PAIR_INCLUDED
33
3131 Pair_setup (int trim_mismatch_score_in, int trim_indel_score_in,
3232 bool gff3_separators_p_in, bool sam_insert_0M_p_in, bool force_xs_direction_p_in,
3333 bool md_lowercase_variant_p_in, bool snps_p_in, bool print_nsnpdiffs_p_in,
34 Univcoord_T genomelength_in);
34 Univcoord_T genomelength_in, bool gff3_phase_swap_p_in);
3535 extern int
3636 Pair_querypos (T this);
3737 extern Chrpos_T
0 static char rcsid[] = "$Id: pairpool.c 195763 2016-08-04 01:37:20Z twu $";
0 static char rcsid[] = "$Id: pairpool.c 195548 2016-08-02 17:18:50Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
0 static char rcsid[] = "$Id: samprint.c 195961 2016-08-08 16:36:34Z twu $";
0 static char rcsid[] = "$Id: samprint.c 196273 2016-08-12 15:15:06Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
22612261
22622262
22632263 if (sensep == true) {
2264 assert(Substring_chimera_pos(donor) == Substring_queryend(donor));
2264 assert(Substring_siteD_pos(donor) == Substring_queryend(donor));
22652265 if (plusp == true) {
22662266 /* sensep true, plusp true */
22672267 /* FPRINTF(fp,"donor sensep true, plusp true\n"); */
23202320 }
23212321
23222322 } else {
2323 assert(Substring_chimera_pos(donor) == Substring_querystart(donor));
2323 assert(Substring_siteD_pos(donor) == Substring_querystart(donor));
23242324 if (plusp == true) {
23252325 /* sensep false, plusp true */
23262326 /* FPRINTF(fp,"donor sensep false, plusp true\n"); */
26682668 /* 12. TAGS: XT */
26692669 if (print_xt_p == true) {
26702670 FPRINTF(fp,"\tXT:Z:%c%c-%c%c,%.2f,%.2f",donor1,donor2,acceptor2,acceptor1,donor_prob,acceptor_prob);
2671 FPRINTF(fp,",%c%s@%u..%c%s@%u",donor_strand,donor_chr,Substring_chr_splicecoord(donor),
2672 acceptor_strand,acceptor_chr,Substring_chr_splicecoord(acceptor));
2671 FPRINTF(fp,",%c%s@%u..%c%s@%u",donor_strand,donor_chr,Substring_chr_splicecoord_D(donor),
2672 acceptor_strand,acceptor_chr,Substring_chr_splicecoord_A(acceptor));
26732673 }
26742674
26752675 /* 12. TAGS: XC */
27752775
27762776
27772777 if (sensep == true) {
2778 assert(Substring_chimera_pos(acceptor) == Substring_querystart(acceptor));
2778 assert(Substring_siteA_pos(acceptor) == Substring_querystart(acceptor));
27792779 if (plusp == true) {
27802780 /* sensep true, plusp true */
27812781 /* FPRINTF(fp,"acceptor sensep true, plusp true\n"); */
28292829
28302830 } else {
28312831 /* sensep false, plusp true */
2832 assert(Substring_chimera_pos(acceptor) == Substring_queryend(acceptor));
2832 assert(Substring_siteA_pos(acceptor) == Substring_queryend(acceptor));
28332833 if (plusp == true) {
28342834 /* FPRINTF(fp,"acceptor sensep false, plusp true\n"); */
28352835 if (hide_soft_clips_p == true) {
31733173 /* 12. TAGS: XT */
31743174 if (print_xt_p == true) {
31753175 FPRINTF(fp,"\tXT:Z:%c%c-%c%c,%.2f,%.2f",donor1,donor2,acceptor2,acceptor1,donor_prob,acceptor_prob);
3176 FPRINTF(fp,",%c%s@%u..%c%s@%u",donor_strand,donor_chr,Substring_chr_splicecoord(donor),
3177 acceptor_strand,acceptor_chr,Substring_chr_splicecoord(acceptor));
3176 FPRINTF(fp,",%c%s@%u..%c%s@%u",donor_strand,donor_chr,Substring_chr_splicecoord_D(donor),
3177 acceptor_strand,acceptor_chr,Substring_chr_splicecoord_A(acceptor));
31783178 }
31793179
31803180
32313231 halfacceptor_dinucleotide(&acceptor2,&acceptor1,acceptor,sensedir);
32323232 donor_chr = Univ_IIT_label(chromosome_iit,Substring_chrnum(donor),&allocp);
32333233 acceptor_chr = Univ_IIT_label(chromosome_iit,Substring_chrnum(acceptor),&allocp);
3234 donor_prob = Substring_chimera_prob(donor);
3235 acceptor_prob = Substring_chimera_prob(acceptor);
3234 donor_prob = Substring_siteD_prob(donor);
3235 acceptor_prob = Substring_siteA_prob(acceptor);
32363236
32373237 /* Code taken from that for XS tag for print_halfdonor and print_halfacceptor */
32383238 /* For the donor and acceptor strands, use the substring sensedir and not the Stage3end_T sensedir */
0 static char rcsid[] = "$Id: sarray-read.c 195763 2016-08-04 01:37:20Z twu $";
0 static char rcsid[] = "$Id: sarray-read.c 196431 2016-08-16 20:19:22Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
23482348 this->all_positions = (Univcoord_T *) NULL;
23492349
23502350 } else {
2351 /* Function surrounded by HAVE_ALLOCA */
23512352 #ifdef USE_QSORT
23522353 positions_temp = out = (Univcoord_T *) MALLOCA((this->finalptr - this->initptr + 1) * sizeof(Univcoord_T));
23532354 #else
25022503 #endif
25032504 }
25042505
2506 /* Function surrounded by HAVE_ALLOCA */
25052507 FREEA(positions_temp);
25062508 }
25072509
25982600 this->all_positions = (Univcoord_T *) NULL;
25992601
26002602 } else {
2603 /* Function surrounded by HAVE_ALLOCA */
26012604 positions_temp = out = (Univcoord_T *) MALLOCA((this->finalptr - this->initptr + 1) * sizeof(Univcoord_T));
26022605
26032606 low_adj = low + this->querystart;
27502753 #endif
27512754 }
27522755
2756 /* Function surrounded by HAVE_ALLOCA */
27532757 FREEA(positions_temp);
27542758 }
27552759
27902794 this->all_positions = (Univcoord_T *) NULL;
27912795
27922796 } else {
2797 /* Function surrounded by HAVE_ALLOCA */
27932798 #ifdef USE_QSORT
27942799 positions_temp = (Univcoord_T *) MALLOCA((this->finalptr - this->initptr + 1) * sizeof(Univcoord_T));
27952800 #else
29702975 #endif
29712976 }
29722977
2978 /* Function surrounded by HAVE_ALLOCA */
29732979 FREEA(positions_temp);
29742980 }
29752981
36513657 int k, j, i, n;
36523658 bool segmenti_usedp, segmentj_usedp;
36533659 bool foundp;
3654
3655 #ifdef HAVE_ALLOCA
3656 int *segmenti_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
3657 int *segmentj_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
3658 int *segmentj_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
3659 int *segmenti_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
3660 int *segmenti_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
3661 int *segmentj_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
3662 int *segmentj_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
3663 int *segmenti_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
3664 #else
3665 int segmenti_donor_knownpos[MAX_READLENGTH+1], segmentj_acceptor_knownpos[MAX_READLENGTH+1],
3666 segmentj_antidonor_knownpos[MAX_READLENGTH+1], segmenti_antiacceptor_knownpos[MAX_READLENGTH+1];
3667 int segmenti_donor_knowni[MAX_READLENGTH+1], segmentj_acceptor_knowni[MAX_READLENGTH+1],
3668 segmentj_antidonor_knowni[MAX_READLENGTH+1], segmenti_antiacceptor_knowni[MAX_READLENGTH+1];
3669 #endif
3660 int *segmenti_donor_knownpos, *segmentj_acceptor_knownpos, *segmentj_antidonor_knownpos, *segmenti_antiacceptor_knownpos,
3661 *segmenti_donor_knowni, *segmentj_acceptor_knowni, *segmentj_antidonor_knowni, *segmenti_antiacceptor_knowni;
36703662
36713663
36723664 /* Potential success */
36793671 return false;
36803672 } else {
36813673 left = goal /* - querylength */;
3682 }
3674
3675 #ifdef HAVE_ALLOCA
3676 if (querylength <= MAX_STACK_READLENGTH) {
3677 segmenti_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
3678 segmentj_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
3679 segmentj_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
3680 segmenti_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
3681 segmenti_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
3682 segmentj_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
3683 segmentj_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
3684 segmenti_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
3685 } else {
3686 segmenti_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
3687 segmentj_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
3688 segmentj_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
3689 segmenti_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
3690 segmenti_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
3691 segmentj_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
3692 segmentj_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
3693 segmenti_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
3694 }
3695 #else
3696 segmenti_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
3697 segmentj_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
3698 segmentj_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
3699 segmenti_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
3700 segmenti_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
3701 segmentj_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
3702 segmentj_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
3703 segmenti_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
3704 #endif
3705 }
3706
36833707
36843708 nsame = ndiff = 0;
36853709 querystart_diff = querylength;
37923816 debug7(printf("same is at %u from %d to %d\n",left,querystart_same,queryend_same));
37933817
37943818 n = Uintlist_length(difflist);
3819 #ifdef HAVE_ALLOCA
37953820 #ifdef USE_QSORT
37963821 array = (UINT4 *) MALLOCA(n * sizeof(UINT4));
37973822 #else
37983823 array = (UINT4 *) MALLOCA((n + 1) * sizeof(UINT4));
37993824 #endif
3825 #else
3826 #ifdef USE_QSORT
3827 array = (UINT4 *) MALLOC(n * sizeof(UINT4));
3828 #else
3829 array = (UINT4 *) MALLOC((n + 1) * sizeof(UINT4));
3830 #endif
3831 #endif
3832
38003833 Uintlist_fill_array_and_free(array,&difflist);
38013834 #ifdef USE_QSORT
38023835 qsort(array,n,sizeof(Univcoord_T),Univcoord_compare);
40004033 Substring_genomicstart(Stage3end_substring_donor(hit)),Substring_genomicend(Stage3end_substring_donor(hit)),
40014034 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
40024035 Substring_genomicstart(Stage3end_substring_acceptor(hit)),Substring_genomicend(Stage3end_substring_acceptor(hit)),
4003 Stage3end_nmismatches_whole(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
4004 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
4036 Stage3end_nmismatches_whole(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
4037 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
40054038 if ((nmismatches = Stage3end_nmismatches_whole(hit)) < best_nmismatches) {
40064039 best_nmismatches = nmismatches;
40074040 }
40174050 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP &&
40184051 Stage3end_chimera_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP) {
40194052 debug7(printf("accepting distance %d, probabilities %f and %f\n",
4020 Stage3end_distance(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
4021 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
4053 Stage3end_distance(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
4054 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
40224055 n_good_spliceends += 1;
40234056 accepted_hits = List_push(accepted_hits,(void *) hit);
40244057 } else {
40344067 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP ||
40354068 Stage3end_chimera_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP) {
40364069 debug7(printf("accepting distance %d, probabilities %f and %f\n",
4037 Stage3end_distance(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
4038 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
4070 Stage3end_distance(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
4071 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
40394072 n_good_spliceends += 1;
40404073 accepted_hits = List_push(accepted_hits,(void *) hit);
40414074 } else {
41114144 for (k = i; k < j; k++) {
41124145 acceptor = Stage3end_substring_acceptor(hitarray[k]);
41134146 #ifdef LARGE_GENOMES
4114 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(acceptor));
4115 #else
4116 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(acceptor));
4147 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_A(acceptor));
4148 #else
4149 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_A(acceptor));
41174150 #endif
41184151 amb_knowni = Intlist_push(amb_knowni,-1);
41194152 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(acceptor));
4120 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(acceptor));
4153 amb_probs = Doublelist_push(amb_probs,Substring_siteA_prob(acceptor));
41214154 }
41224155
41234156 nmismatches_acceptor = best_nmismatches - Substring_nmismatches_whole(donor);
4124 prob = best_prob - Substring_chimera_prob(donor);
4157 prob = best_prob - Substring_siteD_prob(donor);
41254158 *ambiguous = List_push(*ambiguous,
41264159 (void *) Stage3end_new_splice(&(*found_score),
41274160 /*nmismatches_donor*/Substring_nmismatches_whole(donor),nmismatches_acceptor,
41744207 for (k = i; k < j; k++) {
41754208 donor = Stage3end_substring_donor(hitarray[k]);
41764209 #ifdef LARGE_GENOMES
4177 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(donor));
4178 #else
4179 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(donor));
4210 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_D(donor));
4211 #else
4212 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_D(donor));
41804213 #endif
41814214 amb_knowni = Intlist_push(amb_knowni,-1);
41824215 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(donor));
4183 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(donor));
4216 amb_probs = Doublelist_push(amb_probs,Substring_siteD_prob(donor));
41844217 }
41854218
41864219 nmismatches_donor = best_nmismatches - Substring_nmismatches_whole(acceptor);
4187 prob = best_prob - Substring_chimera_prob(acceptor);
4220 prob = best_prob - Substring_siteA_prob(acceptor);
41884221 *ambiguous = List_push(*ambiguous,
41894222 (void *) Stage3end_new_splice(&(*found_score),
41904223 nmismatches_donor,/*nmismatches_acceptor*/Substring_nmismatches_whole(acceptor),
42294262 Substring_genomicstart(Stage3end_substring_donor(hit)),Substring_genomicend(Stage3end_substring_donor(hit)),
42304263 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
42314264 Substring_genomicstart(Stage3end_substring_acceptor(hit)),Substring_genomicend(Stage3end_substring_acceptor(hit)),
4232 Stage3end_nmismatches_whole(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
4233 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
4265 Stage3end_nmismatches_whole(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
4266 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
42344267 if ((nmismatches = Stage3end_nmismatches_whole(hit)) < best_nmismatches) {
42354268 best_nmismatches = nmismatches;
42364269 }
42484281 debug7(printf("accepting distance %d, donor length %d and acceptor length %d, probabilities %f and %f\n",
42494282 Stage3end_distance(hit),Substring_match_length_orig(Stage3end_substring_donor(hit)),
42504283 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
4251 Substring_chimera_prob(Stage3end_substring_donor(hit)),
4252 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
4284 Substring_siteD_prob(Stage3end_substring_donor(hit)),
4285 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
42534286 n_good_spliceends += 1;
42544287 accepted_hits = List_push(accepted_hits,(void *) hit);
42554288 } else {
42674300 debug7(printf("accepting distance %d, donor length %d and acceptor length %d, probabilities %f and %f\n",
42684301 Stage3end_distance(hit),Substring_match_length_orig(Stage3end_substring_donor(hit)),
42694302 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
4270 Substring_chimera_prob(Stage3end_substring_donor(hit)),
4271 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
4303 Substring_siteD_prob(Stage3end_substring_donor(hit)),
4304 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
42724305 n_good_spliceends += 1;
42734306 accepted_hits = List_push(accepted_hits,(void *) hit);
42744307 } else {
43444377 for (k = i; k < j; k++) {
43454378 acceptor = Stage3end_substring_acceptor(hitarray[k]);
43464379 #ifdef LARGE_GENOMES
4347 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(acceptor));
4348 #else
4349 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(acceptor));
4380 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_A(acceptor));
4381 #else
4382 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_A(acceptor));
43504383 #endif
43514384 amb_knowni = Intlist_push(amb_knowni,-1);
43524385 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(acceptor));
4353 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(acceptor));
4386 amb_probs = Doublelist_push(amb_probs,Substring_siteA_prob(acceptor));
43544387 }
43554388
43564389 nmismatches_acceptor = best_nmismatches - Substring_nmismatches_whole(donor);
4357 prob = best_prob - Substring_chimera_prob(donor);
4390 prob = best_prob - Substring_siteD_prob(donor);
43584391 *ambiguous = List_push(*ambiguous,
43594392 (void *) Stage3end_new_splice(&(*found_score),
43604393 /*nmismatches_donor*/Substring_nmismatches_whole(donor),nmismatches_acceptor,
44074440 for (k = i; k < j; k++) {
44084441 donor = Stage3end_substring_donor(hitarray[k]);
44094442 #ifdef LARGE_GENOMES
4410 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(donor));
4411 #else
4412 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(donor));
4443 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_D(donor));
4444 #else
4445 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_D(donor));
44134446 #endif
44144447 amb_knowni = Intlist_push(amb_knowni,-1);
44154448 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(donor));
4416 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(donor));
4449 amb_probs = Doublelist_push(amb_probs,Substring_siteD_prob(donor));
44174450 }
44184451
44194452 nmismatches_donor = best_nmismatches - Substring_nmismatches_whole(acceptor);
4420 prob = best_prob - Substring_chimera_prob(acceptor);
4453 prob = best_prob - Substring_siteA_prob(acceptor);
44214454 *ambiguous = List_push(*ambiguous,
44224455 (void *) Stage3end_new_splice(&(*found_score),
44234456 nmismatches_donor,/*nmismatches_acceptor*/Substring_nmismatches_whole(acceptor),
44594492 }
44604493 List_free(&lowprob);
44614494
4495 #ifdef HAVE_ALLOCA
44624496 FREEA(array);
4497 #else
4498 FREE(array);
4499 #endif
44634500
44644501 } else if (querystart_diff == 0 && queryend_same == querylength - 1) {
44654502 left2 = left;
44674504 debug7(printf("same is at %u from %d to %d\n",left,querystart_same,queryend_same));
44684505
44694506 n = Uintlist_length(difflist);
4507 #ifdef HAVE_ALLOCA
44704508 #ifdef USE_QSORT
44714509 array = (UINT4 *) MALLOCA(n * sizeof(UINT4));
44724510 #else
44734511 array = (UINT4 *) MALLOCA((n + 1) * sizeof(UINT4));
44744512 #endif
4513 #else
4514 #ifdef USE_QSORT
4515 array = (UINT4 *) MALLOC(n * sizeof(UINT4));
4516 #else
4517 array = (UINT4 *) MALLOC((n + 1) * sizeof(UINT4));
4518 #endif
4519 #endif
4520
44754521 Uintlist_fill_array_and_free(array,&difflist);
44764522 #ifdef USE_QSORT
44774523 qsort(array,n,sizeof(Univcoord_T),Univcoord_compare);
46644710 Substring_genomicstart(Stage3end_substring_donor(hit)),Substring_genomicend(Stage3end_substring_donor(hit)),
46654711 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
46664712 Substring_genomicstart(Stage3end_substring_acceptor(hit)),Substring_genomicend(Stage3end_substring_acceptor(hit)),
4667 Stage3end_nmismatches_whole(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
4668 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
4713 Stage3end_nmismatches_whole(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
4714 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
46694715 if ((nmismatches = Stage3end_nmismatches_whole(hit)) < best_nmismatches) {
46704716 best_nmismatches = nmismatches;
46714717 }
46814727 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP &&
46824728 Stage3end_chimera_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP) {
46834729 debug7(printf("accepting distance %d, probabilities %f and %f\n",
4684 Stage3end_distance(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
4685 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
4730 Stage3end_distance(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
4731 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
46864732 n_good_spliceends += 1;
46874733 accepted_hits = List_push(accepted_hits,(void *) hit);
46884734 } else {
47004746 debug7(printf("accepting distance %d, donor length %d and acceptor length %d, probabilities %f and %f\n",
47014747 Stage3end_distance(hit),Substring_match_length_orig(Stage3end_substring_donor(hit)),
47024748 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
4703 Substring_chimera_prob(Stage3end_substring_donor(hit)),
4704 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
4749 Substring_siteD_prob(Stage3end_substring_donor(hit)),
4750 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
47054751 n_good_spliceends += 1;
47064752 accepted_hits = List_push(accepted_hits,(void *) hit);
47074753 } else {
47774823 for (k = i; k < j; k++) {
47784824 acceptor = Stage3end_substring_acceptor(hitarray[k]);
47794825 #ifdef LARGE_GENOMES
4780 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(acceptor));
4781 #else
4782 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(acceptor));
4826 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_A(acceptor));
4827 #else
4828 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_A(acceptor));
47834829 #endif
47844830 amb_knowni = Intlist_push(amb_knowni,-1);
47854831 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(acceptor));
4786 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(acceptor));
4832 amb_probs = Doublelist_push(amb_probs,Substring_siteA_prob(acceptor));
47874833 }
47884834
47894835 nmismatches_acceptor = best_nmismatches - Substring_nmismatches_whole(donor);
4790 prob = best_prob - Substring_chimera_prob(donor);
4836 prob = best_prob - Substring_siteD_prob(donor);
47914837 *ambiguous = List_push(*ambiguous,
47924838 (void *) Stage3end_new_splice(&(*found_score),
47934839 /*nmismatches_donor*/Substring_nmismatches_whole(donor),nmismatches_acceptor,
48414887 for (k = i; k < j; k++) {
48424888 donor = Stage3end_substring_donor(hitarray[k]);
48434889 #ifdef LARGE_GENOMES
4844 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(donor));
4845 #else
4846 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(donor));
4890 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_D(donor));
4891 #else
4892 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_D(donor));
48474893 #endif
48484894 amb_knowni = Intlist_push(amb_knowni,-1);
48494895 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(donor));
4850 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(donor));
4896 amb_probs = Doublelist_push(amb_probs,Substring_siteD_prob(donor));
48514897 }
48524898
48534899 nmismatches_donor = best_nmismatches - Substring_nmismatches_whole(acceptor);
4854 prob = best_prob - Substring_chimera_prob(acceptor);
4900 prob = best_prob - Substring_siteA_prob(acceptor);
48554901 *ambiguous = List_push(*ambiguous,
48564902 (void *) Stage3end_new_splice(&(*found_score),
48574903 nmismatches_donor,/*nmismatches_acceptor*/Substring_nmismatches_whole(acceptor),
48964942 Substring_genomicstart(Stage3end_substring_donor(hit)),Substring_genomicend(Stage3end_substring_donor(hit)),
48974943 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
48984944 Substring_genomicstart(Stage3end_substring_acceptor(hit)),Substring_genomicend(Stage3end_substring_acceptor(hit)),
4899 Stage3end_nmismatches_whole(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
4900 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
4945 Stage3end_nmismatches_whole(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
4946 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
49014947 if ((nmismatches = Stage3end_nmismatches_whole(hit)) < best_nmismatches) {
49024948 best_nmismatches = nmismatches;
49034949 }
49134959 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP &&
49144960 Stage3end_chimera_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP) {
49154961 debug7(printf("accepting distance %d, probabilities %f and %f\n",
4916 Stage3end_distance(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
4917 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
4962 Stage3end_distance(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
4963 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
49184964 n_good_spliceends += 1;
49194965 accepted_hits = List_push(accepted_hits,(void *) hit);
49204966 } else {
49324978 debug7(printf("accepting distance %d, donor length %d and acceptor length %d, probabilities %f and %f\n",
49334979 Stage3end_distance(hit),Substring_match_length_orig(Stage3end_substring_donor(hit)),
49344980 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
4935 Substring_chimera_prob(Stage3end_substring_donor(hit)),
4936 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
4981 Substring_siteD_prob(Stage3end_substring_donor(hit)),
4982 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
49374983 n_good_spliceends += 1;
49384984 accepted_hits = List_push(accepted_hits,(void *) hit);
49394985 } else {
50095055 for (k = i; k < j; k++) {
50105056 acceptor = Stage3end_substring_acceptor(hitarray[k]);
50115057 #ifdef LARGE_GENOMES
5012 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(acceptor));
5013 #else
5014 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(acceptor));
5058 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_A(acceptor));
5059 #else
5060 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_A(acceptor));
50155061 #endif
50165062 amb_knowni = Intlist_push(amb_knowni,-1);
50175063 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(acceptor));
5018 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(acceptor));
5064 amb_probs = Doublelist_push(amb_probs,Substring_siteA_prob(acceptor));
50195065 }
50205066
50215067 nmismatches_acceptor = best_nmismatches - Substring_nmismatches_whole(donor);
5022 prob = best_prob - Substring_chimera_prob(donor);
5068 prob = best_prob - Substring_siteD_prob(donor);
50235069 *ambiguous = List_push(*ambiguous,
50245070 (void *) Stage3end_new_splice(&(*found_score),
50255071 /*nmismatches_donor*/Substring_nmismatches_whole(donor),nmismatches_acceptor,
50725118 for (k = i; k < j; k++) {
50735119 donor = Stage3end_substring_donor(hitarray[k]);
50745120 #ifdef LARGE_GENOMES
5075 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(donor));
5076 #else
5077 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(donor));
5121 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_D(donor));
5122 #else
5123 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_D(donor));
50785124 #endif
50795125 amb_knowni = Intlist_push(amb_knowni,-1);
50805126 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(donor));
5081 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(donor));
5127 amb_probs = Doublelist_push(amb_probs,Substring_siteD_prob(donor));
50825128 }
50835129
50845130 nmismatches_donor = best_nmismatches - Substring_nmismatches_whole(acceptor);
5085 prob = best_prob - Substring_chimera_prob(acceptor);
5131 prob = best_prob - Substring_siteA_prob(acceptor);
50865132 *ambiguous = List_push(*ambiguous,
50875133 (void *) Stage3end_new_splice(&(*found_score),
50885134 nmismatches_donor,/*nmismatches_acceptor*/Substring_nmismatches_whole(acceptor),
51255171 }
51265172 List_free(&lowprob);
51275173
5174 #ifdef HAVE_ALLOCA
51285175 FREEA(array);
5176 #else
5177 FREE(array);
5178 #endif
51295179
51305180 } else {
51315181 Uintlist_free(&difflist);
51325182 }
5183
5184
5185 #ifdef HAVE_ALLOCA
5186 if (querylength <= MAX_STACK_READLENGTH) {
5187 FREEA(segmenti_donor_knownpos);
5188 FREEA(segmentj_acceptor_knownpos);
5189 FREEA(segmentj_antidonor_knownpos);
5190 FREEA(segmenti_antiacceptor_knownpos);
5191 FREEA(segmenti_donor_knowni);
5192 FREEA(segmentj_acceptor_knowni);
5193 FREEA(segmentj_antidonor_knowni);
5194 FREEA(segmenti_antiacceptor_knowni);
5195 } else {
5196 FREE(segmenti_donor_knownpos);
5197 FREE(segmentj_acceptor_knownpos);
5198 FREE(segmentj_antidonor_knownpos);
5199 FREE(segmenti_antiacceptor_knownpos);
5200 FREE(segmenti_donor_knowni);
5201 FREE(segmentj_acceptor_knowni);
5202 FREE(segmentj_antidonor_knowni);
5203 FREE(segmenti_antiacceptor_knowni);
5204 }
5205 #else
5206 FREE(segmenti_donor_knownpos);
5207 FREE(segmentj_acceptor_knownpos);
5208 FREE(segmentj_antidonor_knownpos);
5209 FREE(segmenti_antiacceptor_knownpos);
5210 FREE(segmenti_donor_knowni);
5211 FREE(segmentj_acceptor_knowni);
5212 FREE(segmentj_antidonor_knowni);
5213 FREE(segmenti_antiacceptor_knowni);
5214 #endif
51335215
51345216 return twopartp;
51355217 }
53235405
53245406 #ifdef SUBDIVIDE_NOMATCHES
53255407 /* Try to subdivide elts that have no matches */
5408 #ifdef HAVE_ALLOCA
53265409 coveredp = (bool *) CALLOCA(querylength,sizeof(bool));
5327 mappings = (Chrpos_T **) MALLOCA(querylength * sizeof(Chrpos_T *));
5410 mappings = (Chrpos_T **) ALLOCA(querylength * sizeof(Chrpos_T *));
53285411 npositions = (int *) CALLOCA(querylength,sizeof(int));
5412 #else
5413 coveredp = (bool *) CALLOC(querylength,sizeof(bool));
5414 mappings = (Chrpos_T **) MALLOC(querylength * sizeof(Chrpos_T *));
5415 npositions = (int *) CALLOC(querylength,sizeof(int));
5416 #endif
53295417 oligoindex = Oligoindex_array_elt(oligoindices_minor,/*source*/0);
53305418 indexsize = Oligoindex_indexsize(oligoindex);
53315419
59356023
59366024 int segmenti_donor_nknown, segmentj_acceptor_nknown,
59376025 segmentj_antidonor_nknown, segmenti_antiacceptor_nknown;
6026 int *segmenti_donor_knownpos, *segmentj_acceptor_knownpos, *segmentj_antidonor_knownpos, *segmenti_antiacceptor_knownpos,
6027 *segmenti_donor_knowni, *segmentj_acceptor_knowni, *segmentj_antidonor_knowni, *segmenti_antiacceptor_knowni;
6028 int j;
6029
59386030 #ifdef HAVE_ALLOCA
5939 int *segmenti_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
5940 int *segmentj_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
5941 int *segmentj_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
5942 int *segmenti_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
5943 int *segmenti_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
5944 int *segmentj_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
5945 int *segmentj_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
5946 int *segmenti_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
5947 #else
5948 int segmenti_donor_knownpos[MAX_READLENGTH+1], segmentj_acceptor_knownpos[MAX_READLENGTH+1],
5949 segmentj_antidonor_knownpos[MAX_READLENGTH+1], segmenti_antiacceptor_knownpos[MAX_READLENGTH+1];
5950 int segmenti_donor_knowni[MAX_READLENGTH+1], segmentj_acceptor_knowni[MAX_READLENGTH+1],
5951 segmentj_antidonor_knowni[MAX_READLENGTH+1], segmenti_antiacceptor_knowni[MAX_READLENGTH+1];
5952 #endif
5953
5954 int j;
6031 if (querylength <= MAX_STACK_READLENGTH) {
6032 segmenti_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
6033 segmentj_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
6034 segmentj_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
6035 segmenti_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
6036 segmenti_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
6037 segmentj_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
6038 segmentj_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
6039 segmenti_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
6040 } else {
6041 segmenti_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
6042 segmentj_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
6043 segmentj_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
6044 segmenti_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
6045 segmenti_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
6046 segmentj_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
6047 segmentj_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
6048 segmenti_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
6049 }
6050 #else
6051 segmenti_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
6052 segmentj_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
6053 segmentj_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
6054 segmenti_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
6055 segmenti_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
6056 segmentj_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
6057 segmentj_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
6058 segmenti_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
6059 #endif
6060
59556061
59566062 debug13(printf("***Entered find_best_path\n"));
59576063
66716777 debug13(printf("***Exiting find_best_path\n"));
66726778
66736779 #ifdef SUBDIVIDE_ENDS
6780 #ifdef HAVE_ALLOCA
66746781 FREEA(npositions);
66756782 FREEA(coveredp);
66766783 FREEA(mappings);
6784 #else
6785 FREE(npositions);
6786 FREE(coveredp);
6787 FREE(mappings);
6788 #endif
6789 #endif
6790
6791
6792 #ifdef HAVE_ALLOCA
6793 if (querylength <= MAX_STACK_READLENGTH) {
6794 FREEA(segmenti_donor_knownpos);
6795 FREEA(segmentj_acceptor_knownpos);
6796 FREEA(segmentj_antidonor_knownpos);
6797 FREEA(segmenti_antiacceptor_knownpos);
6798 FREEA(segmenti_donor_knowni);
6799 FREEA(segmentj_acceptor_knowni);
6800 FREEA(segmentj_antidonor_knowni);
6801 FREEA(segmenti_antiacceptor_knowni);
6802 } else {
6803 FREE(segmenti_donor_knownpos);
6804 FREE(segmentj_acceptor_knownpos);
6805 FREE(segmentj_antidonor_knownpos);
6806 FREE(segmenti_antiacceptor_knownpos);
6807 FREE(segmenti_donor_knowni);
6808 FREE(segmentj_acceptor_knowni);
6809 FREE(segmentj_antidonor_knowni);
6810 FREE(segmenti_antiacceptor_knowni);
6811 }
6812 #else
6813 FREE(segmenti_donor_knownpos);
6814 FREE(segmentj_acceptor_knownpos);
6815 FREE(segmentj_antidonor_knownpos);
6816 FREE(segmenti_antiacceptor_knownpos);
6817 FREE(segmenti_donor_knowni);
6818 FREE(segmentj_acceptor_knowni);
6819 FREE(segmentj_antidonor_knowni);
6820 FREE(segmenti_antiacceptor_knowni);
66776821 #endif
66786822
66796823 return middle_path;
72397383 left_ambig_sense, left_ambig_antisense;
72407384 int segmenti_donor_nknown, segmentj_acceptor_nknown,
72417385 segmentj_antidonor_nknown, segmenti_antiacceptor_nknown;
7386 int *segmenti_donor_knownpos, *segmentj_acceptor_knownpos, *segmentj_antidonor_knownpos, *segmenti_antiacceptor_knownpos,
7387 *segmenti_donor_knowni, *segmentj_acceptor_knowni, *segmentj_antidonor_knowni, *segmenti_antiacceptor_knowni;
72427388 int j;
72437389
72447390 #ifdef HAVE_ALLOCA
7245 int *segmenti_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7246 int *segmentj_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7247 int *segmentj_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7248 int *segmenti_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7249 int *segmenti_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7250 int *segmentj_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7251 int *segmentj_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7252 int *segmenti_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7253 #else
7254 int segmenti_donor_knownpos[MAX_READLENGTH+1], segmentj_acceptor_knownpos[MAX_READLENGTH+1],
7255 segmentj_antidonor_knownpos[MAX_READLENGTH+1], segmenti_antiacceptor_knownpos[MAX_READLENGTH+1];
7256 int segmenti_donor_knowni[MAX_READLENGTH+1], segmentj_acceptor_knowni[MAX_READLENGTH+1],
7257 segmentj_antidonor_knowni[MAX_READLENGTH+1], segmenti_antiacceptor_knowni[MAX_READLENGTH+1];
7391 if (querylength <= MAX_STACK_READLENGTH) {
7392 segmenti_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7393 segmentj_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7394 segmentj_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7395 segmenti_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7396 segmenti_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7397 segmentj_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7398 segmentj_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7399 segmenti_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7400 } else {
7401 segmenti_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7402 segmentj_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7403 segmentj_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7404 segmenti_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7405 segmenti_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7406 segmentj_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7407 segmentj_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7408 segmenti_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7409 }
7410 #else
7411 segmenti_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7412 segmentj_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7413 segmentj_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7414 segmenti_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7415 segmenti_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7416 segmentj_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7417 segmentj_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7418 segmenti_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
72587419 #endif
72597420
72607421
76377798 /* sense_endpoints = Intlist_push(sense_endpoints,queryend); */
76387799
76397800 if (plusp == true) {
7640 right_ambig_sense = Substring_new_ambig(/*querystart*/splice_pos,queryend,
7641 /*splice_pos*/splice_pos,querylength,
7642 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
7643 right_ambcoords_sense,right_amb_knowni_sense,
7644 right_amb_nmismatchesj_sense,right_amb_probsj_sense,
7645 /*amb_common_prob*/Doublelist_head(right_amb_probsi_sense),
7646 /*amb_donor_common_p*/true,/*substring1p*/false);
7801 right_ambig_sense = Substring_new_ambig_A(/*querystart*/splice_pos,queryend,
7802 /*splice_pos*/splice_pos,querylength,
7803 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
7804 right_ambcoords_sense,right_amb_knowni_sense,
7805 right_amb_nmismatchesj_sense,right_amb_probsj_sense,
7806 /*amb_common_prob*/Doublelist_head(right_amb_probsi_sense),
7807 /*substring1p*/false);
76477808 } else {
7648 right_ambig_sense = Substring_new_ambig(/*querystart*/querylength - queryend,querylength - splice_pos,
7649 /*splice_pos*/querylength - splice_pos,querylength,
7650 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
7651 right_ambcoords_sense,right_amb_knowni_sense,
7652 right_amb_nmismatchesj_sense,right_amb_probsj_sense,
7653 /*amb_common_prob*/Doublelist_head(right_amb_probsi_sense),
7654 /*amb_donor_common_p*/false,/*substring1p*/true);
7809 right_ambig_sense = Substring_new_ambig_D(/*querystart*/querylength - queryend,querylength - splice_pos,
7810 /*splice_pos*/querylength - splice_pos,querylength,
7811 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
7812 right_ambcoords_sense,right_amb_knowni_sense,
7813 right_amb_nmismatchesj_sense,right_amb_probsj_sense,
7814 /*amb_common_prob*/Doublelist_head(right_amb_probsi_sense),
7815 /*substring1p*/true);
76557816 }
76567817 }
76577818
77127873 /* antisense_endpoints = Intlist_push(antisense_endpoints,queryend); */
77137874
77147875 if (plusp == true) {
7715 right_ambig_antisense = Substring_new_ambig(/*querystart*/splice_pos,queryend,
7716 /*splice_pos*/splice_pos,querylength,
7717 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
7718 right_ambcoords_antisense,right_amb_knowni_antisense,
7719 right_amb_nmismatchesj_antisense,right_amb_probsj_antisense,
7720 /*amb_common_prob*/Doublelist_head(right_amb_probsi_antisense),
7721 /*amb_donor_common_p*/false,/*substring1p*/false);
7876 right_ambig_antisense = Substring_new_ambig_D(/*querystart*/splice_pos,queryend,
7877 /*splice_pos*/splice_pos,querylength,
7878 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
7879 right_ambcoords_antisense,right_amb_knowni_antisense,
7880 right_amb_nmismatchesj_antisense,right_amb_probsj_antisense,
7881 /*amb_common_prob*/Doublelist_head(right_amb_probsi_antisense),
7882 /*substring1p*/false);
77227883 } else {
7723 right_ambig_antisense = Substring_new_ambig(/*querystart*/querylength - queryend,querylength - splice_pos,
7724 /*splice_pos*/querylength - splice_pos,querylength,
7725 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
7726 right_ambcoords_antisense,right_amb_knowni_antisense,
7727 right_amb_nmismatchesj_antisense,right_amb_probsj_antisense,
7728 /*amb_common_prob*/Doublelist_head(right_amb_probsi_antisense),
7729 /*amb_donor_common_p*/true,/*substring1p*/true);
7884 right_ambig_antisense = Substring_new_ambig_A(/*querystart*/querylength - queryend,querylength - splice_pos,
7885 /*splice_pos*/querylength - splice_pos,querylength,
7886 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
7887 right_ambcoords_antisense,right_amb_knowni_antisense,
7888 right_amb_nmismatchesj_antisense,right_amb_probsj_antisense,
7889 /*amb_common_prob*/Doublelist_head(right_amb_probsi_antisense),
7890 /*substring1p*/true);
77307891 }
77317892 }
77327893
78357996 /* sense_endpoints = Intlist_push(sense_endpoints,querystart); */
78367997
78377998 if (plusp == true) {
7838 left_ambig_sense = Substring_new_ambig(querystart,/*queryend*/splice_pos,
7839 /*splice_pos*/splice_pos,querylength,
7840 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
7841 left_ambcoords_sense,left_amb_knowni_sense,
7842 left_amb_nmismatchesi_sense,left_amb_probsi_sense,
7843 /*amb_common_prob*/Doublelist_head(left_amb_probsj_sense),
7844 /*amb_donor_common_p*/false,/*substring1p*/true);
7999 left_ambig_sense = Substring_new_ambig_D(querystart,/*queryend*/splice_pos,
8000 /*splice_pos*/splice_pos,querylength,
8001 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
8002 left_ambcoords_sense,left_amb_knowni_sense,
8003 left_amb_nmismatchesi_sense,left_amb_probsi_sense,
8004 /*amb_common_prob*/Doublelist_head(left_amb_probsj_sense),
8005 /*substring1p*/true);
78458006 } else {
7846 left_ambig_sense = Substring_new_ambig(querylength - splice_pos,/*queryend*/querylength - querystart,
7847 /*splice_pos*/querylength - splice_pos,querylength,
7848 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
7849 left_ambcoords_sense,left_amb_knowni_sense,
7850 left_amb_nmismatchesi_sense,left_amb_probsi_sense,
7851 /*amb_common_prob*/Doublelist_head(left_amb_probsj_sense),
7852 /*amb_donor_common_p*/true,/*substring1p*/false);
8007 left_ambig_sense = Substring_new_ambig_A(querylength - splice_pos,/*queryend*/querylength - querystart,
8008 /*splice_pos*/querylength - splice_pos,querylength,
8009 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
8010 left_ambcoords_sense,left_amb_knowni_sense,
8011 left_amb_nmismatchesi_sense,left_amb_probsi_sense,
8012 /*amb_common_prob*/Doublelist_head(left_amb_probsj_sense),
8013 /*substring1p*/false);
78538014 }
78548015 }
78558016
79348095 /* antisense_endpoints = Intlist_push(antisense_endpoints,querystart); */
79358096
79368097 if (plusp == true) {
7937 left_ambig_antisense = Substring_new_ambig(querystart,/*queryend*/splice_pos,
7938 /*splice_pos*/splice_pos,querylength,
7939 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
7940 left_ambcoords_antisense,left_amb_knowni_antisense,
7941 left_amb_nmismatchesi_antisense,left_amb_probsi_antisense,
7942 /*amb_common_prob*/Doublelist_head(left_amb_probsj_antisense),
7943 /*amb_donor_common_p*/true,/*substring1p*/true);
8098 left_ambig_antisense = Substring_new_ambig_A(querystart,/*queryend*/splice_pos,
8099 /*splice_pos*/splice_pos,querylength,
8100 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
8101 left_ambcoords_antisense,left_amb_knowni_antisense,
8102 left_amb_nmismatchesi_antisense,left_amb_probsi_antisense,
8103 /*amb_common_prob*/Doublelist_head(left_amb_probsj_antisense),
8104 /*substring1p*/true);
79448105 } else {
7945 left_ambig_antisense = Substring_new_ambig(querylength - splice_pos,/*queryend*/querylength - querystart,
7946 /*splice_pos*/querylength - splice_pos,querylength,
7947 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
7948 left_ambcoords_antisense,left_amb_knowni_antisense,
7949 left_amb_nmismatchesi_antisense,left_amb_probsi_antisense,
7950 /*amb_common_prob*/Doublelist_head(left_amb_probsj_antisense),
7951 /*amb_donor_common_p*/false,/*substring1p*/false);
8106 left_ambig_antisense = Substring_new_ambig_D(querylength - splice_pos,/*queryend*/querylength - querystart,
8107 /*splice_pos*/querylength - splice_pos,querylength,
8108 chrnum,chroffset,chrhigh,chrlength,plusp,genestrand,
8109 left_ambcoords_antisense,left_amb_knowni_antisense,
8110 left_amb_nmismatchesi_antisense,left_amb_probsi_antisense,
8111 /*amb_common_prob*/Doublelist_head(left_amb_probsj_antisense),
8112 /*substring1p*/false);
79528113 }
79538114 }
79548115
81298290 Univdiag_free(&diagonal);
81308291 }
81318292 List_free(&super_path);
8293
8294 #ifdef HAVE_ALLOCA
8295 if (querylength <= MAX_STACK_READLENGTH) {
8296 FREEA(segmenti_donor_knownpos);
8297 FREEA(segmentj_acceptor_knownpos);
8298 FREEA(segmentj_antidonor_knownpos);
8299 FREEA(segmenti_antiacceptor_knownpos);
8300 FREEA(segmenti_donor_knowni);
8301 FREEA(segmentj_acceptor_knowni);
8302 FREEA(segmentj_antidonor_knowni);
8303 FREEA(segmenti_antiacceptor_knowni);
8304 } else {
8305 FREE(segmenti_donor_knownpos);
8306 FREE(segmentj_acceptor_knownpos);
8307 FREE(segmentj_antidonor_knownpos);
8308 FREE(segmenti_antiacceptor_knownpos);
8309 FREE(segmenti_donor_knowni);
8310 FREE(segmentj_acceptor_knowni);
8311 FREE(segmentj_antidonor_knowni);
8312 FREE(segmenti_antiacceptor_knowni);
8313 }
8314 #else
8315 FREE(segmenti_donor_knownpos);
8316 FREE(segmentj_acceptor_knownpos);
8317 FREE(segmentj_antidonor_knownpos);
8318 FREE(segmenti_antiacceptor_knownpos);
8319 FREE(segmenti_donor_knowni);
8320 FREE(segmentj_acceptor_knowni);
8321 FREE(segmentj_antidonor_knowni);
8322 FREE(segmenti_antiacceptor_knowni);
8323 #endif
81328324
81338325 return hits;
81348326 }
0 static char rcsid[] = "$Id: sedgesort.c 195760 2016-08-04 00:12:04Z twu $";
0 static char rcsid[] = "$Id: sedgesort.c 196273 2016-08-12 15:15:06Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
0 /* $Id: sedgesort.h 195760 2016-08-04 00:12:04Z twu $ */
0 /* $Id: sedgesort.h 196273 2016-08-12 15:15:06Z twu $ */
11 #ifndef SEDGESORT_INCLUDED
22 #define SEDGESORT_INCLUDED
33
0 static char rcsid[] = "$Id: shortread.c 195760 2016-08-04 00:12:04Z twu $";
0 static char rcsid[] = "$Id: shortread.c 196410 2016-08-16 15:57:57Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
199199 static char Header[HEADERLEN];
200200 static char Discard[DISCARDLEN];
201201
202 static char Read1[MAX_READLENGTH+1];
203 static char Read2[MAX_READLENGTH+1];
204 static char Quality[MAX_READLENGTH+1];
202
203 /* input_oneline() can actually read longer than this */
204 #define MAX_EXPECTED_READLENGTH 300
205
206 static char Read1[MAX_EXPECTED_READLENGTH+1];
207 static char Read2[MAX_EXPECTED_READLENGTH+1];
208 static char Quality[MAX_EXPECTED_READLENGTH+1];
205209
206210
207211 /* The first element of Sequence is always the null character, to mark
14861490 *longstring = (char *) NULL;
14871491
14881492 ptr = &(Start[0]);
1489 remainder = (&(Start[MAX_READLENGTH]) - ptr)/sizeof(char);
1493 remainder = (&(Start[MAX_EXPECTED_READLENGTH]) - ptr)/sizeof(char);
14901494 if (*nextchar == EOF || (possible_fasta_header_p == true && (*nextchar == '>' || *nextchar == '+'))) {
14911495 debug(printf("nchars %d: EOF or > or +: Returning 0\n",*nchars));
14921496 return 0;
15331537 debug(printf("No line feed, but not end of file. Using Intlist_T.\n"));
15341538 intlist = (Intlist_T) NULL;
15351539 i = 0;
1536 while (i <= MAX_READLENGTH && Start[i] != '\0') {
1540 while (i <= MAX_EXPECTED_READLENGTH && Start[i] != '\0') {
15371541 debug(printf("Pushing %c\n",Start[i]));
15381542 intlist = Intlist_push_in(intlist,Start[i]);
15391543 i++;
15841588 *longstring = (char *) NULL;
15851589
15861590 ptr = &(Start[0]);
1587 remainder = (&(Start[MAX_READLENGTH]) - ptr)/sizeof(char);
1591 remainder = (&(Start[MAX_EXPECTED_READLENGTH]) - ptr)/sizeof(char);
15881592 if (*nextchar == EOF || *nextchar == '\0' ||
15891593 (possible_fasta_header_p == true && (*nextchar == '>' || *nextchar == '+'))) {
15901594 debug(printf("EOF or > or +: Returning 0\n"));
16311635 debug(printf("No line feed, but not end of file. Using Intlist_T.\n"));
16321636 intlist = (Intlist_T) NULL;
16331637 i = 0;
1634 while (i <= MAX_READLENGTH && Start[i] != '\0') {
1638 while (i <= MAX_EXPECTED_READLENGTH && Start[i] != '\0') {
16351639 debug(printf("Pushing %c\n",Start[i]));
16361640 intlist = Intlist_push_in(intlist,Start[i]);
16371641 i++;
16811685 *longstring = (char *) NULL;
16821686
16831687 ptr = &(Start[0]);
1684 remainder = (&(Start[MAX_READLENGTH]) - ptr)/sizeof(char);
1688 remainder = (&(Start[MAX_EXPECTED_READLENGTH]) - ptr)/sizeof(char);
16851689 if (*nextchar == EOF || (possible_fasta_header_p == true && (*nextchar == '>' || *nextchar == '+'))) {
16861690 debug(printf("EOF or > or +: Returning 0\n"));
16871691 return 0;
17301734 debug(printf("No line feed, but not end of file. Using Intlist_T.\n"));
17311735 intlist = (Intlist_T) NULL;
17321736 i = 0;
1733 while (i <= MAX_READLENGTH && Start[i] != '\0') {
1737 while (i <= MAX_EXPECTED_READLENGTH && Start[i] != '\0') {
17341738 debug(printf("Pushing %c\n",Start[i]));
17351739 intlist = Intlist_push_in(intlist,Start[i]);
17361740 i++;
17911795 *longstring = (char *) NULL;
17921796
17931797 ptr = &(Start[0]);
1794 remainder = (&(Start[MAX_READLENGTH]) - ptr)/sizeof(char);
1798 remainder = (&(Start[MAX_EXPECTED_READLENGTH]) - ptr)/sizeof(char);
17951799 if (*nextchar == EOF || (possible_fasta_header_p == true && (*nextchar == '>' || *nextchar == '+'))) {
17961800 debug(printf("EOF or > or +: Returning 0\n"));
17971801 return 0;
18401844 debug(printf("No line feed, but not end of file. Using Intlist_T.\n"));
18411845 intlist = (Intlist_T) NULL;
18421846 i = 0;
1843 while (i <= MAX_READLENGTH && Start[i] != '\0') {
1847 while (i <= MAX_EXPECTED_READLENGTH && Start[i] != '\0') {
18441848 debug(printf("Pushing %c\n",Start[i]));
18451849 intlist = Intlist_push_in(intlist,Start[i]);
18461850 i++;
0 static char rcsid[] = "$Id: splice.c 195753 2016-08-03 23:44:46Z twu $";
0 static char rcsid[] = "$Id: splice.c 196431 2016-08-16 20:19:22Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
140140 int donori_nsites, acceptorj_nsites, antiacceptori_nsites, antidonorj_nsites;
141141 int *donori_positions, *acceptorj_positions, *antiacceptori_positions, *antidonorj_positions;
142142 int *donori_knowni, *acceptorj_knowni, *antiacceptori_knowni, *antidonorj_knowni;
143 int *donor_positions_alloc, *acceptor_positions_alloc, *donor_knowni_alloc, *acceptor_knowni_alloc;
144
143145
144146 #ifdef HAVE_ALLOCA
145 int *donor_positions_alloc = (int *) alloca((querylength+1)*sizeof(int));
146 int *acceptor_positions_alloc = (int *) alloca((querylength+1)*sizeof(int));
147 int *donor_knowni_alloc = (int *) alloca((querylength+1)*sizeof(int));
148 int *acceptor_knowni_alloc = (int *) alloca((querylength+1)*sizeof(int));
147 if (querylength <= MAX_STACK_READLENGTH) {
148 donor_positions_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
149 acceptor_positions_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
150 donor_knowni_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
151 acceptor_knowni_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
152 } else {
153 donor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
154 acceptor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
155 donor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
156 acceptor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
157 }
149158 #else
150 int donor_positions_alloc[MAX_READLENGTH+1], acceptor_positions_alloc[MAX_READLENGTH+1];
151 int donor_knowni_alloc[MAX_READLENGTH+1], acceptor_knowni_alloc[MAX_READLENGTH+1];
159 donor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
160 acceptor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
161 donor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
162 acceptor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
152163 #endif
153164
154165
404415
405416 debug1(printf("best_knowni_i is %d and best_knowni_j is %d\n",*best_knowni_i,*best_knowni_j));
406417
418 #ifdef HAVE_ALLOCA
419 if (querylength <= MAX_STACK_READLENGTH) {
420 FREEA(donor_positions_alloc);
421 FREEA(acceptor_positions_alloc);
422 FREEA(donor_knowni_alloc);
423 FREEA(acceptor_knowni_alloc);
424 } else {
425 FREE(donor_positions_alloc);
426 FREE(acceptor_positions_alloc);
427 FREE(donor_knowni_alloc);
428 FREE(acceptor_knowni_alloc);
429 }
430 #else
431 FREE(donor_positions_alloc);
432 FREE(acceptor_positions_alloc);
433 FREE(donor_knowni_alloc);
434 FREE(acceptor_knowni_alloc);
435 #endif
436
437
407438 if (*best_prob_i > 0.95 && *best_prob_j > 0.70) {
408439 debug1(printf("Returning %d with probi %f and probj %f\n",best_splice_pos,*best_prob_i,*best_prob_j));
409440 debug1(printf("nmismatches %d and %d\n",*best_nmismatches_i,*best_nmismatches_j));
450481 int donori_nsites, acceptorj_nsites, antiacceptori_nsites, antidonorj_nsites;
451482 int *donori_positions, *acceptorj_positions, *antiacceptori_positions, *antidonorj_positions;
452483 int *donori_knowni, *acceptorj_knowni, *antiacceptori_knowni, *antidonorj_knowni;
484 int *donor_positions_alloc, *acceptor_positions_alloc, *donor_knowni_alloc, *acceptor_knowni_alloc;
485
453486
454487 #ifdef HAVE_ALLOCA
455 int *donor_positions_alloc = (int *) alloca((querylength+1)*sizeof(int));
456 int *acceptor_positions_alloc = (int *) alloca((querylength+1)*sizeof(int));
457 int *donor_knowni_alloc = (int *) alloca((querylength+1)*sizeof(int));
458 int *acceptor_knowni_alloc = (int *) alloca((querylength+1)*sizeof(int));
488 if (querylength <= MAX_STACK_READLENGTH) {
489 donor_positions_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
490 acceptor_positions_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
491 donor_knowni_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
492 acceptor_knowni_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
493 } else {
494 donor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
495 acceptor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
496 donor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
497 acceptor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
498 }
459499 #else
460 int donor_positions_alloc[MAX_READLENGTH+1], acceptor_positions_alloc[MAX_READLENGTH+1];
461 int donor_knowni_alloc[MAX_READLENGTH+1], acceptor_knowni_alloc[MAX_READLENGTH+1];
500 donor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
501 acceptor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
502 donor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
503 acceptor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
462504 #endif
463505
464506 debug1(printf("Splice_resolve_antisense: Getting genome at lefti %u and leftj %u (diff: %d), range %d..%d\n",
710752 }
711753 }
712754
755 #ifdef HAVE_ALLOCA
756 if (querylength <= MAX_STACK_READLENGTH) {
757 FREEA(donor_positions_alloc);
758 FREEA(acceptor_positions_alloc);
759 FREEA(donor_knowni_alloc);
760 FREEA(acceptor_knowni_alloc);
761 } else {
762 FREE(donor_positions_alloc);
763 FREE(acceptor_positions_alloc);
764 FREE(donor_knowni_alloc);
765 FREE(acceptor_knowni_alloc);
766 }
767 #else
768 FREE(donor_positions_alloc);
769 FREE(acceptor_positions_alloc);
770 FREE(donor_knowni_alloc);
771 FREE(acceptor_knowni_alloc);
772 #endif
773
774
713775 debug1(printf("best_knowni_i is %d and best_knowni_j is %d\n",*best_knowni_i,*best_knowni_j));
714776
715777 if (*best_prob_i > 0.95 && *best_prob_j > 0.70) {
771833 int donori_nsites, acceptorj_nsites, antiacceptori_nsites, antidonorj_nsites;
772834 int *donori_positions, *acceptorj_positions, *antiacceptori_positions, *antidonorj_positions;
773835 int *donori_knowni, *acceptorj_knowni, *antiacceptori_knowni, *antidonorj_knowni;
836 int *donor_positions_alloc, *acceptor_positions_alloc, *donor_knowni_alloc, *acceptor_knowni_alloc;
837
774838
775839 #ifdef HAVE_ALLOCA
776 int *donor_positions_alloc = (int *) alloca((querylength+1)*sizeof(int));
777 int *acceptor_positions_alloc = (int *) alloca((querylength+1)*sizeof(int));
778 int *donor_knowni_alloc = (int *) alloca((querylength+1)*sizeof(int));
779 int *acceptor_knowni_alloc = (int *) alloca((querylength+1)*sizeof(int));
840 if (querylength <= MAX_STACK_READLENGTH) {
841 donor_positions_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
842 acceptor_positions_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
843 donor_knowni_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
844 acceptor_knowni_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
845 } else {
846 donor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
847 acceptor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
848 donor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
849 acceptor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
850 }
780851 #else
781 int donor_positions_alloc[MAX_READLENGTH+1], acceptor_positions_alloc[MAX_READLENGTH+1];
782 int donor_knowni_alloc[MAX_READLENGTH+1], acceptor_knowni_alloc[MAX_READLENGTH+1];
783 #endif
784
852 donor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
853 acceptor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
854 donor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
855 acceptor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
856 #endif
785857
786858 debug1(printf("Splice_solve_single: Getting genome at lefti %u and leftj %u (diff: %d)\n",
787859 segmenti_left,segmentj_left,segmentj_left-segmenti_left));
10831155
10841156 if (sufficient1p && sufficient2p) {
10851157 *nhits += 1;
1086 return List_push(hits,(void *) Stage3end_new_splice(&(*found_score),best_segmenti_nmismatches,best_segmentj_nmismatches,
1158 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),best_segmenti_nmismatches,best_segmentj_nmismatches,
10871159 donor,acceptor,best_donor_prob,best_acceptor_prob,
10881160 /*distance*/segmentj_left - segmenti_left,
10891161 /*shortdistancep*/true,splicing_penalty,querylength,
10931165 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
10941166 /*copy_donor_p*/false,/*copy_acceptor_p*/false,first_read_p,sensedir,
10951167 sarrayp));
1168 /* return hits; */
10961169 } else if (subs_or_indels_p == true) {
10971170 if (donor != NULL) Substring_free(&donor);
10981171 if (acceptor != NULL) Substring_free(&acceptor);
1099 return hits;
1172 /* return hits; */
11001173 } else if (donor_support < LOWPROB_SUPPORT || acceptor_support < LOWPROB_SUPPORT) {
11011174 if (donor != NULL) Substring_free(&donor);
11021175 if (acceptor != NULL) Substring_free(&acceptor);
1103 return hits;
1176 /* return hits; */
11041177 } else if (sufficient1p || sufficient2p) {
11051178 *lowprob = List_push(*lowprob,
11061179 (void *) Stage3end_new_splice(&(*found_score),best_segmenti_nmismatches,best_segmentj_nmismatches,
11131186 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
11141187 /*copy_donor_p*/false,/*copy_acceptor_p*/false,first_read_p,sensedir,
11151188 sarrayp));
1116 return hits;
1189 /* return hits; */
11171190 } else {
11181191 if (donor != NULL) Substring_free(&donor);
11191192 if (acceptor != NULL) Substring_free(&acceptor);
1193 /* ? return hits; */
11201194 }
11211195 }
11221196
11531227 sufficient2p = sufficient_splice_prob_local(donor_support,best_segmentj_nmismatches,best_donor_prob);
11541228 if (sufficient1p && sufficient2p) {
11551229 *nhits += 1;
1156 return List_push(hits,(void *) Stage3end_new_splice(&(*found_score),best_segmentj_nmismatches,best_segmenti_nmismatches,
1230 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),best_segmentj_nmismatches,best_segmenti_nmismatches,
11571231 donor,acceptor,best_donor_prob,best_acceptor_prob,
11581232 /*distance*/segmentj_left - segmenti_left,
11591233 /*shortdistancep*/true,splicing_penalty,querylength,
11631237 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
11641238 /*copy_donor_p*/false,/*copy_acceptor_p*/false,first_read_p,sensedir,
11651239 sarrayp));
1240 /* return hits; */
11661241 } else if (subs_or_indels_p == true) {
11671242 if (donor != NULL) Substring_free(&donor);
11681243 if (acceptor != NULL) Substring_free(&acceptor);
1169 return hits;
1244 /* return hits; */
11701245 } else if (donor_support < LOWPROB_SUPPORT || acceptor_support < LOWPROB_SUPPORT) {
11711246 if (donor != NULL) Substring_free(&donor);
11721247 if (acceptor != NULL) Substring_free(&acceptor);
1173 return hits;
1248 /* return hits; */
11741249 } else if (sufficient1p || sufficient2p) {
11751250 *lowprob = List_push(*lowprob,
11761251 (void *) Stage3end_new_splice(&(*found_score),best_segmentj_nmismatches,best_segmenti_nmismatches,
11831258 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
11841259 /*copy_donor_p*/false,/*copy_acceptor_p*/false,first_read_p,sensedir,
11851260 sarrayp));
1186 return hits;
1261 /* return hits; */
11871262 } else {
11881263 if (donor != NULL) Substring_free(&donor);
11891264 if (acceptor != NULL) Substring_free(&acceptor);
1190 return hits;
1191 }
1192 }
1193 }
1194 }
1195 }
1196
1197 debug1(printf("Splice_solve_single_sense fail\n"));
1265 /* ? return hits; */
1266 }
1267 }
1268 }
1269 }
1270 }
1271
1272 #ifdef HAVE_ALLOCA
1273 if (querylength <= MAX_STACK_READLENGTH) {
1274 FREEA(donor_positions_alloc);
1275 FREEA(acceptor_positions_alloc);
1276 FREEA(donor_knowni_alloc);
1277 FREEA(acceptor_knowni_alloc);
1278 } else {
1279 FREE(donor_positions_alloc);
1280 FREE(acceptor_positions_alloc);
1281 FREE(donor_knowni_alloc);
1282 FREE(acceptor_knowni_alloc);
1283 }
1284 #else
1285 FREE(donor_positions_alloc);
1286 FREE(acceptor_positions_alloc);
1287 FREE(donor_knowni_alloc);
1288 FREE(acceptor_knowni_alloc);
1289 #endif
1290
11981291 return hits;
11991292 }
12001293
12341327 int donori_nsites, acceptorj_nsites, antiacceptori_nsites, antidonorj_nsites;
12351328 int *donori_positions, *acceptorj_positions, *antiacceptori_positions, *antidonorj_positions;
12361329 int *donori_knowni, *acceptorj_knowni, *antiacceptori_knowni, *antidonorj_knowni;
1330 int *donor_positions_alloc, *acceptor_positions_alloc, *donor_knowni_alloc, *acceptor_knowni_alloc;
1331
12371332
12381333 #ifdef HAVE_ALLOCA
1239 int *donor_positions_alloc = (int *) alloca((querylength+1)*sizeof(int));
1240 int *acceptor_positions_alloc = (int *) alloca((querylength+1)*sizeof(int));
1241 int *donor_knowni_alloc = (int *) alloca((querylength+1)*sizeof(int));
1242 int *acceptor_knowni_alloc = (int *) alloca((querylength+1)*sizeof(int));
1334 if (querylength <= MAX_STACK_READLENGTH) {
1335 donor_positions_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
1336 acceptor_positions_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
1337 donor_knowni_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
1338 acceptor_knowni_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
1339 } else {
1340 donor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
1341 acceptor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
1342 donor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
1343 acceptor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
1344 }
12431345 #else
1244 int donor_positions_alloc[MAX_READLENGTH+1], acceptor_positions_alloc[MAX_READLENGTH+1];
1245 int donor_knowni_alloc[MAX_READLENGTH+1], acceptor_knowni_alloc[MAX_READLENGTH+1];
1346 donor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
1347 acceptor_positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
1348 donor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
1349 acceptor_knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
12461350 #endif
12471351
12481352 debug1(printf("Splice_solve_single: Getting genome at lefti %u and leftj %u (diff: %d)\n",
15451649
15461650 if (sufficient1p && sufficient2p) {
15471651 *nhits += 1;
1548 return List_push(hits,(void *) Stage3end_new_splice(&(*found_score),best_segmenti_nmismatches,best_segmentj_nmismatches,
1652 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),best_segmenti_nmismatches,best_segmentj_nmismatches,
15491653 donor,acceptor,best_donor_prob,best_acceptor_prob,
15501654 /*distance*/segmentj_left - segmenti_left,
15511655 /*shortdistancep*/true,splicing_penalty,querylength,
15551659 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
15561660 /*copy_donor_p*/false,/*copy_acceptor_p*/false,first_read_p,sensedir,
15571661 sarrayp));
1662 /* return hits; */
15581663 } else if (subs_or_indels_p == true) {
15591664 if (donor != NULL) Substring_free(&donor);
15601665 if (acceptor != NULL) Substring_free(&acceptor);
1561 return hits;
1666 /* return hits; */
15621667 } else if (donor_support < LOWPROB_SUPPORT || acceptor_support < LOWPROB_SUPPORT) {
15631668 if (donor != NULL) Substring_free(&donor);
15641669 if (acceptor != NULL) Substring_free(&acceptor);
1565 return hits;
1670 /* return hits; */
15661671 } else if (sufficient1p || sufficient2p) {
15671672 *lowprob = List_push(*lowprob,
15681673 (void *) Stage3end_new_splice(&(*found_score),best_segmenti_nmismatches,best_segmentj_nmismatches,
15751680 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
15761681 /*copy_donor_p*/false,/*copy_acceptor_p*/false,first_read_p,sensedir,
15771682 sarrayp));
1578 return hits;
1683 /* return hits; */
15791684 } else {
15801685 if (donor != NULL) Substring_free(&donor);
15811686 if (acceptor != NULL) Substring_free(&acceptor);
1687 /* ? return hits; */
15821688 }
15831689 }
15841690
16151721 sufficient2p = sufficient_splice_prob_local(donor_support,best_segmentj_nmismatches,best_donor_prob);
16161722 if (sufficient1p && sufficient2p) {
16171723 *nhits += 1;
1618 return List_push(hits,(void *) Stage3end_new_splice(&(*found_score),best_segmentj_nmismatches,best_segmenti_nmismatches,
1724 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),best_segmentj_nmismatches,best_segmenti_nmismatches,
16191725 donor,acceptor,best_donor_prob,best_acceptor_prob,
16201726 /*distance*/segmentj_left - segmenti_left,
16211727 /*shortdistancep*/true,splicing_penalty,querylength,
16251731 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
16261732 /*copy_donor_p*/false,/*copy_acceptor_p*/false,first_read_p,sensedir,
16271733 sarrayp));
1734 /* return hits; */
16281735 } else if (subs_or_indels_p == true) {
16291736 if (donor != NULL) Substring_free(&donor);
16301737 if (acceptor != NULL) Substring_free(&acceptor);
1631 return hits;
1738 /* return hits; */
16321739 } else if (donor_support < LOWPROB_SUPPORT || acceptor_support < LOWPROB_SUPPORT) {
16331740 if (donor != NULL) Substring_free(&donor);
16341741 if (acceptor != NULL) Substring_free(&acceptor);
1635 return hits;
1742 /* return hits; */
16361743 } else if (sufficient1p || sufficient2p) {
16371744 *lowprob = List_push(*lowprob,
16381745 (void *) Stage3end_new_splice(&(*found_score),best_segmentj_nmismatches,best_segmenti_nmismatches,
16451752 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
16461753 /*copy_donor_p*/false,/*copy_acceptor_p*/false,first_read_p,sensedir,
16471754 sarrayp));
1648 return hits;
1755 /* return hits; */
16491756 } else {
16501757 if (donor != NULL) Substring_free(&donor);
16511758 if (acceptor != NULL) Substring_free(&acceptor);
1652 return hits;
1653 }
1654 }
1655 }
1656 }
1657 }
1658
1659 debug1(printf("Splice_solve_single_antisense fail\n"));
1759 /* ? return hits; */
1760 }
1761 }
1762 }
1763 }
1764 }
1765
1766 #ifdef HAVE_ALLOCA
1767 if (querylength <= MAX_STACK_READLENGTH) {
1768 FREEA(donor_positions_alloc);
1769 FREEA(acceptor_positions_alloc);
1770 FREEA(donor_knowni_alloc);
1771 FREEA(acceptor_knowni_alloc);
1772 } else {
1773 FREE(donor_positions_alloc);
1774 FREE(acceptor_positions_alloc);
1775 FREE(donor_knowni_alloc);
1776 FREE(acceptor_knowni_alloc);
1777 }
1778 #else
1779 FREE(donor_positions_alloc);
1780 FREE(acceptor_positions_alloc);
1781 FREE(donor_knowni_alloc);
1782 FREE(acceptor_knowni_alloc);
1783 #endif
1784
16601785 return hits;
16611786 }
16621787
1663
1664 #if 0
1665 List_T
1666 Splice_solve_double (int *found_score, int *nhits, List_T hits, List_T *lowprob,
1667
1668 bool *segmenti_usedp, bool *segmentm_usedp, bool *segmentj_usedp,
1669 Univcoord_T segmenti_left, Univcoord_T segmentm_left, Univcoord_T segmentj_left,
1670 Chrnum_T segmenti_chrnum, Univcoord_T segmenti_chroffset,
1671 Univcoord_T segmenti_chrhigh, Chrpos_T segmenti_chrlength,
1672 Chrnum_T segmentm_chrnum, Univcoord_T segmentm_chroffset,
1673 Univcoord_T segmentm_chrhigh, Chrpos_T segmentm_chrlength,
1674 Chrnum_T segmentj_chrnum, Univcoord_T segmentj_chroffset,
1675 Univcoord_T segmentj_chrhigh, Chrpos_T segmentj_chrlength,
1676
1677 int querylength, Compress_T query_compress,
1678 int *segmenti_donor_knownpos, int *segmentm_acceptor_knownpos, int *segmentm_donor_knownpos, int *segmentj_acceptor_knownpos,
1679 int *segmentj_antidonor_knownpos, int *segmentm_antiacceptor_knownpos, int *segmentm_antidonor_knownpos, int *segmenti_antiacceptor_knownpos,
1680 int *segmenti_donor_knowni, int *segmentm_acceptor_knowni, int *segmentm_donor_knowni, int *segmentj_acceptor_knowni,
1681 int *segmentj_antidonor_knowni, int *segmentm_antiacceptor_knowni, int *segmentm_antidonor_knowni, int *segmenti_antiacceptor_knowni,
1682 int segmenti_donor_nknown, int segmentm_acceptor_nknown, int segmentm_donor_nknown, int segmentj_acceptor_nknown,
1683 int segmentj_antidonor_nknown, int segmentm_antiacceptor_nknown, int segmentm_antidonor_nknown, int segmenti_antiacceptor_nknown,
1684 int splicing_penalty, int max_mismatches_allowed, bool plusp, int genestrand,
1685 bool subs_or_indels_p, bool sarrayp) {
1686 Substring_T donor, shortexon, acceptor;
1687 int best_splice_pos_1, best_splice_pos_2, splice_pos_start, splice_pos_end, splice_pos_1, splice_pos_2;
1688 int i, a, b, j;
1689
1690 int best_nmismatches, nmismatches;
1691 int best_segmenti_nmismatches, best_segmentm_nmismatches, best_segmentj_nmismatches,
1692 segmenti_nmismatches, segmentm_nmismatches, segmentj_nmismatches;
1693 int donor_support, acceptor_support, middle_support;
1694 Univcoord_T best_donor1_splicecoord, best_acceptor1_splicecoord, best_donor2_splicecoord, best_acceptor2_splicecoord;
1695 int best_donor1_knowni, best_acceptor1_knowni, best_donor2_knowni, best_acceptor2_knowni;
1696 double best_prob, best_donor1_prob, best_acceptor1_prob, best_donor2_prob, best_acceptor2_prob,
1697 probi, proba, probb, probj;
1698 bool sufficient1p, sufficient2p, sufficient3p, sufficient4p, orig_plusp, matchp;
1699 int sensedir;
1700
1701 int donori_nsites, acceptora_nsites, donorb_nsites, acceptorj_nsites,
1702 antiacceptori_nsites, antidonora_nsites, antiacceptorb_nsites, antidonorj_nsites;
1703 int *donori_positions, *acceptora_positions, *donorb_positions, *acceptorj_positions,
1704 *antiacceptori_positions, *antidonora_positions, *antiacceptorb_positions, *antidonorj_positions;
1705 int *donori_knowni, *acceptora_knowni, *donorb_knowni, *acceptorj_knowni,
1706 *antiacceptori_knowni, *antidonora_knowni, *antiacceptorb_knowni, *antidonorj_knowni;
1707
1708 #ifdef HAVE_ALLOCA
1709 int *donor1_positions_alloc = (int *) alloca((querylength+1)*sizeof(int));
1710 int *acceptor1_positions_alloc = (int *) alloca((querylength+1)*sizeof(int));
1711 int *donor2_positions_alloc = (int *) alloca((querylength+1)*sizeof(int));
1712 int *acceptor2_positions_alloc = (int *) alloca((querylength+1)*sizeof(int));
1713 int *donor1_knowni_alloc = (int *) alloca((querylength+1)*sizeof(int));
1714 int *acceptor1_knowni_alloc = (int *) alloca((querylength+1)*sizeof(int));
1715 int *donor2_knowni_alloc = (int *) alloca((querylength+1)*sizeof(int));
1716 int *acceptor2_knowni_alloc = (int *) alloca((querylength+1)*sizeof(int));
1717 #else
1718 int donor1_positions_alloc[MAX_READLENGTH+1], acceptor1_positions_alloc[MAX_READLENGTH+1],
1719 donor2_positions_alloc[MAX_READLENGTH+1], acceptor2_positions_alloc[MAX_READLENGTH+1];
1720 int donor1_knowni_alloc[MAX_READLENGTH+1], acceptor1_knowni_alloc[MAX_READLENGTH+1],
1721 donor2_knowni_alloc[MAX_READLENGTH+1], acceptor2_knowni_alloc[MAX_READLENGTH+1];
1722 #endif
1723
1724
1725 debug2(printf("Splice_solve_double: Getting genome at lefti %u, leftm %u, and leftj %u\n",
1726 segmenti_left,segmentm_left,segmentj_left));
1727
1728 *nhits = 0;
1729 splice_pos_start = min_shortend;
1730 splice_pos_end = querylength - min_shortend; /* ? off by 1, so -l 3 allows only ends of up to 2 */
1731
1732 if (splice_pos_start <= splice_pos_end) {
1733 /* Originally from plus strand. No complement. */
1734 /* Sense (End 1 to End 2) or Antisense (End 5 to End 6) */
1735
1736 /* Segment i */
1737 if (novelsplicingp && segmenti_left + splice_pos_start >= DONOR_MODEL_LEFT_MARGIN) {
1738 donori_nsites = Genome_donor_positions(donor1_positions_alloc,donor1_knowni_alloc,
1739 segmenti_donor_knownpos,segmenti_donor_knowni,
1740 segmenti_left,splice_pos_start,splice_pos_end);
1741 donori_positions = donor1_positions_alloc;
1742 donori_knowni = donor1_knowni_alloc;
1743 } else {
1744 donori_nsites = segmenti_donor_nknown;
1745 donori_positions = segmenti_donor_knownpos;
1746 donori_knowni = segmenti_donor_knowni;
1747 }
1748
1749 #ifdef DEBUG2
1750 printf("Found %d donori sites:",donori_nsites);
1751 for (i = 0; i < donori_nsites; i++) {
1752 printf(" %d",donori_positions[i]);
1753 if (donori_knowni[i] >= 0) {
1754 printf(" (%d)",donori_knowni[i]);
1755 }
1756 }
1757 printf("\n");
1758 #endif
1759
1760 /* Segment m1 */
1761 if (novelsplicingp && segmentm_left + splice_pos_start >= ACCEPTOR_MODEL_LEFT_MARGIN) {
1762 acceptora_nsites = Genome_acceptor_positions(acceptor1_positions_alloc,acceptor1_knowni_alloc,
1763 segmentm_acceptor_knownpos,segmentm_acceptor_knowni,
1764 segmentm_left,splice_pos_start,splice_pos_end);
1765 acceptora_positions = acceptor1_positions_alloc;
1766 acceptora_knowni = acceptor1_knowni_alloc;
1767 } else {
1768 acceptora_nsites = segmentm_acceptor_nknown;
1769 acceptora_positions = segmentm_acceptor_knownpos;
1770 acceptora_knowni = segmentm_acceptor_knowni;
1771 }
1772
1773 #ifdef DEBUG2
1774 printf("Found %d acceptora sites:",acceptora_nsites);
1775 for (i = 0; i < acceptora_nsites; i++) {
1776 printf(" %d",acceptora_positions[i]);
1777 if (acceptora_knowni[i] >= 0) {
1778 printf(" (%d)",acceptora_knowni[i]);
1779 }
1780 }
1781 printf("\n");
1782 #endif
1783
1784 /* Segment m2 */
1785 if (novelsplicingp && segmentm_left + splice_pos_start >= DONOR_MODEL_LEFT_MARGIN) {
1786 donorb_nsites = Genome_donor_positions(donor2_positions_alloc,donor2_knowni_alloc,
1787 segmentm_donor_knownpos,segmentm_donor_knowni,
1788 segmentm_left,splice_pos_start,splice_pos_end);
1789 donorb_positions = donor2_positions_alloc;
1790 donorb_knowni = donor2_knowni_alloc;
1791 } else {
1792 donorb_nsites = segmentm_donor_nknown;
1793 donorb_positions = segmentm_donor_knownpos;
1794 donorb_knowni = segmentm_donor_knowni;
1795 }
1796
1797 #ifdef DEBUG2
1798 printf("Found %d donorb sites:",donorb_nsites);
1799 for (i = 0; i < donorb_nsites; i++) {
1800 printf(" %d",donorb_positions[i]);
1801 if (donorb_knowni[i] >= 0) {
1802 printf(" (%d)",donorb_knowni[i]);
1803 }
1804 }
1805 printf("\n");
1806 #endif
1807
1808 /* Segment j */
1809 if (novelsplicingp && segmentj_left + splice_pos_start >= ACCEPTOR_MODEL_LEFT_MARGIN) {
1810 acceptorj_nsites = Genome_acceptor_positions(acceptor2_positions_alloc,acceptor2_knowni_alloc,
1811 segmentj_acceptor_knownpos,segmentj_acceptor_knowni,
1812 segmentj_left,splice_pos_start,splice_pos_end);
1813 acceptorj_positions = acceptor2_positions_alloc;
1814 acceptorj_knowni = acceptor2_knowni_alloc;
1815 } else {
1816 acceptorj_nsites = segmentj_acceptor_nknown;
1817 acceptorj_positions = segmentj_acceptor_knownpos;
1818 acceptorj_knowni = segmentj_acceptor_knowni;
1819 }
1820
1821 #ifdef DEBUG2
1822 printf("Found %d acceptorj sites:",acceptorj_nsites);
1823 for (i = 0; i < acceptorj_nsites; i++) {
1824 printf(" %d",acceptorj_positions[i]);
1825 if (acceptorj_knowni[i] >= 0) {
1826 printf(" (%d)",acceptorj_knowni[i]);
1827 }
1828 }
1829 printf("\n");
1830 #endif
1831
1832 best_nmismatches = max_mismatches_allowed;
1833 best_prob = 0.0;
1834 orig_plusp = true;
1835
1836 i = a = b = j = 0;
1837 while (i < donori_nsites && a < acceptora_nsites) {
1838 if ((splice_pos_1 = donori_positions[i]) < acceptora_positions[a]) {
1839 i++;
1840 } else if (splice_pos_1 > acceptora_positions[a]) {
1841 a++;
1842 } else {
1843 while (b < donorb_nsites && donorb_positions[b] <= splice_pos_1) {
1844 b++;
1845 }
1846 while (j < acceptorj_nsites && acceptorj_positions[j] <= splice_pos_1) {
1847 j++;
1848 }
1849 matchp = false;
1850 while (b < donorb_nsites && j < acceptorj_nsites && matchp == false) {
1851 if ((splice_pos_2 = donorb_positions[b]) < acceptorj_positions[j]) {
1852 b++;
1853 } else if (splice_pos_2 > acceptorj_positions[j]) {
1854 j++;
1855 } else {
1856 segmenti_nmismatches = Genome_count_mismatches_substring(query_compress,/*left*/segmenti_left,/*pos5*/0,/*pos3*/splice_pos_1,
1857 plusp,genestrand);
1858 segmentm_nmismatches = Genome_count_mismatches_substring(query_compress,/*left*/segmentm_left,/*pos5*/splice_pos_1,/*pos3*/splice_pos_2,
1859 plusp,genestrand);
1860 segmentj_nmismatches = Genome_count_mismatches_substring(query_compress,/*left*/segmentj_left,/*pos5*/splice_pos_2,/*pos3*/querylength,
1861 plusp,genestrand);
1862 if ((nmismatches = segmenti_nmismatches + segmentm_nmismatches + segmentj_nmismatches) <= best_nmismatches) {
1863 if (donori_knowni[i] >= 0) {
1864 probi = 1.0; /* Needs to be 1.0 for output */
1865 } else {
1866 probi = Maxent_hr_donor_prob(segmenti_left + splice_pos_1,segmenti_chroffset);
1867 }
1868
1869 if (acceptora_knowni[a] >= 0) {
1870 proba = 1.0; /* Needs to be 1.0 for output */
1871 } else {
1872 proba = Maxent_hr_acceptor_prob(segmentm_left + splice_pos_1,segmentm_chroffset);
1873 }
1874
1875 if (donorb_knowni[b] >= 0) {
1876 probb = 1.0; /* Needs to be 1.0 for output */
1877 } else {
1878 probb = Maxent_hr_donor_prob(segmentm_left + splice_pos_2,segmentm_chroffset);
1879 }
1880
1881 if (acceptorj_knowni[j] >= 0) {
1882 probj = 1.0; /* Needs to be 1.0 for output */
1883 } else {
1884 probj = Maxent_hr_acceptor_prob(segmentj_left + splice_pos_2,segmentj_chroffset);
1885 }
1886
1887 debug2(
1888 if (plusp == true) {
1889 printf("plus sense splice_pos %d, %d, i.donor %f, m.acceptor %f, m.donor %f, j.acceptor %f\n",
1890 splice_pos_1,splice_pos_2,probi,proba,probb,probj);
1891 } else {
1892 printf("minus antisense splice_pos %d %d, i.donor %f, m.acceptor %f, m.donor %f, j.acceptor %f\n",
1893 splice_pos_1,splice_pos_2,probi,proba,probb,probj);
1894 });
1895
1896 if (nmismatches < best_nmismatches ||
1897 (nmismatches == best_nmismatches && probi + proba + probb + probj > best_prob)) {
1898 /* Success */
1899 best_nmismatches = nmismatches;
1900 best_prob = probi + proba + probb + probj;
1901
1902 best_donor1_splicecoord = segmenti_left + splice_pos_1;
1903 best_acceptor1_splicecoord = segmentm_left + splice_pos_1;
1904 best_donor2_splicecoord = segmentm_left + splice_pos_2;
1905 best_acceptor2_splicecoord = segmentj_left + splice_pos_2;
1906 best_donor1_knowni = donori_knowni[i];
1907 best_acceptor1_knowni = acceptora_knowni[a];
1908 best_donor2_knowni = donorb_knowni[b];
1909 best_acceptor2_knowni = acceptorj_knowni[j];
1910 best_donor1_prob = probi;
1911 best_acceptor1_prob = proba;
1912 best_donor2_prob = probb;
1913 best_acceptor2_prob = probj;
1914 best_splice_pos_1 = splice_pos_1;
1915 best_splice_pos_2 = splice_pos_2;
1916 best_segmenti_nmismatches = segmenti_nmismatches;
1917 best_segmentm_nmismatches = segmentm_nmismatches;
1918 best_segmentj_nmismatches = segmentj_nmismatches;
1919 }
1920 }
1921 /* b++; j++; Don't advance b or j, so next i/a can match */
1922 matchp = true;
1923 }
1924 }
1925 i++;
1926 a++;
1927 }
1928 }
1929
1930
1931 /* Originally from minus strand. Complement. */
1932 /* Antisense (End 7 to End 8) or Sense (End 3 to End 4) */
1933
1934 /* Segment i */
1935 if (novelsplicingp && segmenti_left + splice_pos_start >= ACCEPTOR_MODEL_RIGHT_MARGIN) {
1936 antiacceptori_nsites = Genome_antiacceptor_positions(acceptor1_positions_alloc,acceptor1_knowni_alloc,
1937 segmenti_antiacceptor_knownpos,segmenti_antiacceptor_knowni,
1938 segmenti_left,splice_pos_start,splice_pos_end);
1939 antiacceptori_positions = acceptor1_positions_alloc;
1940 antiacceptori_knowni = acceptor1_knowni_alloc;
1941 } else {
1942 antiacceptori_nsites = segmenti_antiacceptor_nknown;
1943 antiacceptori_positions = segmenti_antiacceptor_knownpos;
1944 antiacceptori_knowni = segmenti_antiacceptor_knowni;
1945 }
1946
1947 #ifdef DEBUG2
1948 printf("Found %d antiacceptori sites:",antiacceptori_nsites);
1949 for (i = 0; i < antiacceptori_nsites; i++) {
1950 printf(" %d",antiacceptori_positions[i]);
1951 if (antiacceptori_knowni[i] >= 0) {
1952 printf(" (%d)",antiacceptori_knowni[i]);
1953 }
1954 }
1955 printf("\n");
1956 #endif
1957
1958 /* Segment m1 */
1959 if (novelsplicingp && segmentm_left + splice_pos_start >= DONOR_MODEL_RIGHT_MARGIN) {
1960 antidonora_nsites = Genome_antidonor_positions(donor1_positions_alloc,donor1_knowni_alloc,
1961 segmentm_antidonor_knownpos,segmentm_antidonor_knowni,
1962 segmentm_left,splice_pos_start,splice_pos_end);
1963 antidonora_positions = donor1_positions_alloc;
1964 antidonora_knowni = donor1_knowni_alloc;
1965 } else {
1966 antidonora_nsites = segmentm_antidonor_nknown;
1967 antidonora_positions = segmentm_antidonor_knownpos;
1968 antidonora_knowni = segmentm_antidonor_knowni;
1969 }
1970
1971 #ifdef DEBUG2
1972 printf("Found %d antidonora sites:",antidonora_nsites);
1973 for (i = 0; i < antidonora_nsites; i++) {
1974 printf(" %d",antidonora_positions[i]);
1975 if (antidonora_knowni[i] >= 0) {
1976 printf(" (%d)",antidonora_knowni[i]);
1977 }
1978 }
1979 printf("\n");
1980 #endif
1981
1982 /* Segment m2 */
1983 if (novelsplicingp && segmentm_left + splice_pos_start >= ACCEPTOR_MODEL_RIGHT_MARGIN) {
1984 antiacceptorb_nsites = Genome_antiacceptor_positions(acceptor2_positions_alloc,acceptor2_knowni_alloc,
1985 segmentm_antiacceptor_knownpos,segmentm_antiacceptor_knowni,
1986 segmentm_left,splice_pos_start,splice_pos_end);
1987 antiacceptorb_positions = acceptor2_positions_alloc;
1988 antiacceptorb_knowni = acceptor2_knowni_alloc;
1989 } else {
1990 antiacceptorb_nsites = segmentm_antiacceptor_nknown;
1991 antiacceptorb_positions = segmentm_antiacceptor_knownpos;
1992 antiacceptorb_knowni = segmentm_antiacceptor_knowni;
1993 }
1994
1995 #ifdef DEBUG2
1996 printf("Found %d antiacceptorb sites:",antiacceptorb_nsites);
1997 for (i = 0; i < antiacceptorb_nsites; i++) {
1998 printf(" %d",antiacceptorb_positions[i]);
1999 if (antiacceptorb_knowni[i] >= 0) {
2000 printf(" (%d)",antiacceptorb_knowni[i]);
2001 }
2002 }
2003 printf("\n");
2004 #endif
2005
2006 /* Segment j */
2007 if (novelsplicingp && segmentj_left + splice_pos_start >= DONOR_MODEL_RIGHT_MARGIN) {
2008 antidonorj_nsites = Genome_antidonor_positions(donor2_positions_alloc,donor2_knowni_alloc,
2009 segmentj_antidonor_knownpos,segmentj_antidonor_knowni,
2010 segmentj_left,splice_pos_start,splice_pos_end);
2011 antidonorj_positions = donor2_positions_alloc;
2012 antidonorj_knowni = donor2_knowni_alloc;
2013 } else {
2014 antidonorj_nsites = segmentj_antidonor_nknown;
2015 antidonorj_positions = segmentj_antidonor_knownpos;
2016 antidonorj_knowni = segmentj_antidonor_knowni;
2017 }
2018
2019 #ifdef DEBUG2
2020 printf("Found %d antidonorj sites:",antidonorj_nsites);
2021 for (i = 0; i < antidonorj_nsites; i++) {
2022 printf(" %d",antidonorj_positions[i]);
2023 if (antidonorj_knowni[i] >= 0) {
2024 printf(" (%d)",antidonorj_knowni[i]);
2025 }
2026 }
2027 printf("\n");
2028 #endif
2029
2030
2031 i = a = b = j = 0;
2032 while (i < antiacceptori_nsites && a < antidonora_nsites) {
2033 if ((splice_pos_1 = antiacceptori_positions[i]) < antidonora_positions[a]) {
2034 i++;
2035 } else if (splice_pos_1 > antidonora_positions[a]) {
2036 a++;
2037 } else {
2038 while (b < antiacceptorb_nsites && antiacceptorb_positions[b] <= splice_pos_1) {
2039 b++;
2040 }
2041 while (j < antidonorj_nsites && antidonorj_positions[j] <= splice_pos_1) {
2042 j++;
2043 }
2044 matchp = false;
2045 while (b < antiacceptorb_nsites && j < antidonorj_nsites && matchp == false) {
2046 if ((splice_pos_2 = antiacceptorb_positions[b]) < antidonorj_positions[j]) {
2047 b++;
2048 } else if (splice_pos_2 > antidonorj_positions[j]) {
2049 j++;
2050 } else {
2051 segmenti_nmismatches = Genome_count_mismatches_substring(query_compress,/*left*/segmenti_left,/*pos5*/0,/*pos3*/splice_pos_1,
2052 plusp,genestrand);
2053 segmentm_nmismatches = Genome_count_mismatches_substring(query_compress,/*left*/segmentm_left,/*pos5*/splice_pos_1,/*pos3*/splice_pos_2,
2054 plusp,genestrand);
2055 segmentj_nmismatches = Genome_count_mismatches_substring(query_compress,/*left*/segmentj_left,/*pos5*/splice_pos_2,/*pos3*/querylength,
2056 plusp,genestrand);
2057
2058 if ((nmismatches = segmenti_nmismatches + segmentm_nmismatches + segmentj_nmismatches) <= best_nmismatches) {
2059 if (antiacceptori_knowni[i] >= 0) {
2060 probi = 1.0; /* Needs to be 1.0 for output */
2061 } else {
2062 probi = Maxent_hr_antiacceptor_prob(segmenti_left + splice_pos_1,segmenti_chroffset);
2063 }
2064
2065 if (antidonora_knowni[a] >= 0) {
2066 proba = 1.0; /* Needs to be 1.0 for output */
2067 } else {
2068 proba = Maxent_hr_antidonor_prob(segmentm_left + splice_pos_1,segmentm_chroffset);
2069 }
2070
2071 if (antiacceptorb_knowni[b] >= 0) {
2072 probb = 1.0; /* Needs to be 1.0 for output */
2073 } else {
2074 probb = Maxent_hr_antiacceptor_prob(segmentm_left + splice_pos_2,segmentm_chroffset);
2075 }
2076
2077 if (antidonorj_knowni[j] >= 0) {
2078 probj = 1.0; /* Needs to be 1.0 for output */
2079 } else {
2080 probj = Maxent_hr_antidonor_prob(segmentj_left + splice_pos_2,segmentj_chroffset);
2081 }
2082
2083 debug2(
2084 if (plusp == true) {
2085 printf("plus antisense splice_pos %d, %d, i.antiacceptor %f, m.antidonor %f, m.antiacceptor %f, j.antidonor %f\n",
2086 splice_pos_1,splice_pos_2,probi,proba,probb,probj);
2087 } else {
2088 printf("minus sense splice_pos %d, %d, i.antiacceptor %f, m.antidonor %f, m.antiacceptor %f, j.antidonor %f\n",
2089 splice_pos_1,splice_pos_2,probi,proba,probb,probj);
2090 });
2091
2092 if (nmismatches < best_nmismatches ||
2093 (nmismatches == best_nmismatches && probi + proba + probb + probj > best_prob)) {
2094 /* Success */
2095 best_nmismatches = nmismatches;
2096 best_prob = probi + proba + probb + probj;
2097
2098 best_acceptor1_splicecoord = segmenti_left + splice_pos_1;
2099 best_donor1_splicecoord = segmentm_left + splice_pos_1;
2100 best_acceptor2_splicecoord = segmentm_left + splice_pos_2;
2101 best_donor2_splicecoord = segmentj_left + splice_pos_2;
2102 best_acceptor1_knowni = antiacceptori_knowni[i];
2103 best_donor1_knowni = antidonora_knowni[a];
2104 best_acceptor2_knowni = antiacceptorb_knowni[b];
2105 best_donor2_knowni = antidonorj_knowni[j];
2106 best_acceptor1_prob = probi;
2107 best_donor1_prob = proba;
2108 best_acceptor2_prob = probb;
2109 best_donor2_prob = probj;
2110 best_splice_pos_1 = splice_pos_1;
2111 best_splice_pos_2 = splice_pos_2;
2112 best_segmenti_nmismatches = segmenti_nmismatches;
2113 best_segmentm_nmismatches = segmentm_nmismatches;
2114 best_segmentj_nmismatches = segmentj_nmismatches;
2115 orig_plusp = false;
2116 }
2117 }
2118 /* b++; j++; Don't advance b or j, so next i/a can match */
2119 matchp = true;
2120 }
2121 }
2122 i++;
2123 a++;
2124 }
2125 }
2126
2127
2128 if (best_prob > 0.0) {
2129 debug2(printf("best_prob = %f at splice_pos %d and %d\n",best_prob,best_splice_pos_1,best_splice_pos_2));
2130 if (orig_plusp == true) {
2131 /* Originally from plus strand. No complement. */
2132 sensedir = (plusp == true) ? SENSE_FORWARD : SENSE_ANTI;
2133
2134 donor = Substring_new_donor(best_donor1_splicecoord,best_donor1_knowni,
2135 best_splice_pos_1,/*substring_querystart*/0,/*substring_queryend*/querylength,
2136 best_segmenti_nmismatches,
2137 best_donor1_prob,/*left*/segmenti_left,query_compress,
2138 querylength,plusp,genestrand,sensedir,
2139 segmenti_chrnum,segmenti_chroffset,segmenti_chrhigh,segmenti_chrlength);
2140
2141 shortexon = Substring_new_shortexon(best_acceptor1_splicecoord,best_acceptor1_knowni,
2142 best_donor2_splicecoord,best_donor2_knowni,
2143 /*acceptor_pos*/best_splice_pos_1,/*donor_pos*/best_splice_pos_2,best_segmentm_nmismatches,
2144 /*acceptor_prob*/best_acceptor1_prob,/*donor_prob*/best_donor2_prob,
2145 /*left*/segmentm_left,query_compress,
2146 querylength,plusp,genestrand,
2147 sensedir,/*acceptor_ambp*/false,/*donor_ambp*/false,
2148 segmentm_chrnum,segmentm_chroffset,segmentm_chrhigh,segmentm_chrlength);
2149
2150 acceptor = Substring_new_acceptor(best_acceptor2_splicecoord,best_acceptor2_knowni,
2151 best_splice_pos_2,/*substring_querystart*/0,/*substring_queryend*/querylength,
2152 best_segmentj_nmismatches,
2153 best_acceptor2_prob,/*left*/segmentj_left,query_compress,
2154 querylength,plusp,genestrand,sensedir,
2155 segmentj_chrnum,segmentj_chroffset,segmentj_chrhigh,segmentj_chrlength);
2156
2157 if (donor == NULL || shortexon == NULL || acceptor == NULL) {
2158 if (donor != NULL) Substring_free(&donor);
2159 if (shortexon != NULL) Substring_free(&shortexon);
2160 if (acceptor != NULL) Substring_free(&acceptor);
2161 } else {
2162 *segmenti_usedp = *segmentm_usedp = *segmentj_usedp = true;
2163
2164 donor_support = best_splice_pos_1;
2165 middle_support = best_splice_pos_2 - best_splice_pos_1;
2166 acceptor_support = querylength - best_splice_pos_2;
2167 sufficient1p = sufficient_splice_prob_local(donor_support,best_segmenti_nmismatches,best_donor1_prob);
2168 sufficient2p = sufficient_splice_prob_local(middle_support,best_segmentm_nmismatches,best_acceptor1_prob);
2169 sufficient3p = sufficient_splice_prob_local(middle_support,best_segmentm_nmismatches,best_donor2_prob);
2170 sufficient4p = sufficient_splice_prob_local(acceptor_support,best_segmentj_nmismatches,best_acceptor2_prob);
2171 if (sufficient1p && sufficient2p && sufficient3p && sufficient4p) {
2172 *nhits += 1;
2173 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),donor,acceptor,shortexon,
2174 best_donor1_prob,/*shortexonA_prob*/best_acceptor1_prob,
2175 /*shortexonD_prob*/best_donor2_prob,best_acceptor2_prob,
2176 /*amb_length_donor*/0,/*amb_length_acceptor*/0,
2177 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
2178 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
2179 /*amb_nmismatches_donor*/NULL,/*amb_nmismatches_acceptor*/NULL,
2180 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
2181 /*copy_donor_p*/false,/*copy_acceptor_p*/false,/*copy_shortexon_p*/false,
2182 splicing_penalty,querylength,sensedir,sarrayp));
2183 } else if (subs_or_indels_p == true) {
2184 /* Don't alter hits */
2185 if (donor != NULL) Substring_free(&donor);
2186 if (shortexon != NULL) Substring_free(&shortexon);
2187 if (acceptor != NULL) Substring_free(&acceptor);
2188 } else if (donor_support < LOWPROB_SUPPORT || acceptor_support < LOWPROB_SUPPORT) {
2189 if (donor != NULL) Substring_free(&donor);
2190 if (shortexon != NULL) Substring_free(&shortexon);
2191 if (acceptor != NULL) Substring_free(&acceptor);
2192 } else if ((sufficient1p || sufficient2p) && (sufficient3p || sufficient4p)) {
2193 *lowprob = List_push(*lowprob,
2194 (void *) Stage3end_new_shortexon(&(*found_score),donor,acceptor,shortexon,
2195 best_donor1_prob,/*shortexonA_prob*/best_acceptor1_prob,
2196 /*shortexonD_prob*/best_donor2_prob,best_acceptor2_prob,
2197 /*amb_length_donor*/0,/*amb_length_acceptor*/0,
2198 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
2199 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
2200 /*amb_nmismatches_donor*/NULL,/*amb_nmismatches_acceptor*/NULL,
2201 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
2202 /*copy_donor_p*/false,/*copy_acceptor_p*/false,/*copy_shortexon_p*/false,
2203 splicing_penalty,querylength,sensedir,sarrayp));
2204 } else {
2205 if (donor != NULL) Substring_free(&donor);
2206 if (shortexon != NULL) Substring_free(&shortexon);
2207 if (acceptor != NULL) Substring_free(&acceptor);
2208 }
2209 }
2210
2211 } else {
2212 /* Originally from minus strand. Complement. */
2213 sensedir = (plusp == true) ? SENSE_ANTI : SENSE_FORWARD;
2214
2215 donor = Substring_new_donor(best_donor2_splicecoord,best_donor2_knowni,
2216 best_splice_pos_2,/*substring_querystart*/0,/*substring_queryend*/querylength,
2217 best_segmentj_nmismatches,
2218 best_donor2_prob,/*left*/segmentj_left,query_compress,
2219 querylength,plusp,genestrand,sensedir,
2220 segmentj_chrnum,segmentj_chroffset,segmentj_chrhigh,segmentj_chrlength);
2221
2222 shortexon = Substring_new_shortexon(best_acceptor2_splicecoord,best_acceptor2_knowni,
2223 best_donor1_splicecoord,best_donor1_knowni,
2224 /*acceptor_pos*/best_splice_pos_2,/*donor_pos*/best_splice_pos_1,best_segmentm_nmismatches,
2225 /*acceptor_prob*/best_acceptor2_prob,/*donor_prob*/best_donor1_prob,
2226 /*left*/segmentm_left,query_compress,querylength,
2227 plusp,genestrand,sensedir,/*acceptor_ambp*/false,/*donor_ambp*/false,
2228 segmentm_chrnum,segmentm_chroffset,segmentm_chrhigh,segmentm_chrlength);
2229
2230 acceptor = Substring_new_acceptor(best_acceptor1_splicecoord,best_acceptor1_knowni,
2231 best_splice_pos_1,/*substring_querystart*/0,/*substring_queryend*/querylength,
2232 best_segmenti_nmismatches,
2233 best_acceptor1_prob,/*left*/segmenti_left,query_compress,
2234 querylength,plusp,genestrand,sensedir,
2235 segmenti_chrnum,segmenti_chroffset,segmenti_chrhigh,segmenti_chrlength);
2236
2237 if (donor == NULL || shortexon == NULL || acceptor == NULL) {
2238 if (donor != NULL) Substring_free(&donor);
2239 if (shortexon != NULL) Substring_free(&shortexon);
2240 if (acceptor != NULL) Substring_free(&acceptor);
2241 } else {
2242 *segmenti_usedp = *segmentm_usedp = *segmentj_usedp = true;
2243
2244 acceptor_support = best_splice_pos_1;
2245 middle_support = best_splice_pos_2 - best_splice_pos_1;
2246 donor_support = querylength - best_splice_pos_2;
2247 sufficient1p = sufficient_splice_prob_local(acceptor_support,best_segmenti_nmismatches,best_acceptor1_prob);
2248 sufficient2p = sufficient_splice_prob_local(middle_support,best_segmentm_nmismatches,best_donor1_prob);
2249 sufficient3p = sufficient_splice_prob_local(middle_support,best_segmentm_nmismatches,best_acceptor2_prob);
2250 sufficient4p = sufficient_splice_prob_local(donor_support,best_segmentj_nmismatches,best_donor2_prob);
2251 if (sufficient1p && sufficient2p && sufficient3p && sufficient4p) {
2252 *nhits += 1;
2253 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),donor,acceptor,shortexon,
2254 best_donor2_prob,/*shortexonA_prob*/best_acceptor2_prob,
2255 /*shortexonD_prob*/best_donor1_prob,best_acceptor1_prob,
2256 /*amb_length_donor*/0,/*amb_length_acceptor*/0,
2257 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
2258 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
2259 /*amb_nmismatches_donor*/NULL,/*amb_nmismatches_acceptor*/NULL,
2260 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
2261 /*copy_donor_p*/false,/*copy_acceptor_p*/false,/*copy_shortexon_p*/false,
2262 splicing_penalty,querylength,sensedir,sarrayp));
2263 } else if (subs_or_indels_p == true) {
2264 /* Don't alter hits */
2265 if (donor != NULL) Substring_free(&donor);
2266 if (shortexon != NULL) Substring_free(&shortexon);
2267 if (acceptor != NULL) Substring_free(&acceptor);
2268 } else if (donor_support < LOWPROB_SUPPORT || acceptor_support < LOWPROB_SUPPORT) {
2269 if (donor != NULL) Substring_free(&donor);
2270 if (shortexon != NULL) Substring_free(&shortexon);
2271 if (acceptor != NULL) Substring_free(&acceptor);
2272 } else if ((sufficient1p || sufficient2p) && (sufficient3p || sufficient4p)) {
2273 *lowprob = List_push(*lowprob,
2274 (void *) Stage3end_new_shortexon(&(*found_score),donor,acceptor,shortexon,
2275 best_donor2_prob,/*shortexonA_prob*/best_acceptor2_prob,
2276 /*shortexonD_prob*/best_donor1_prob,best_acceptor1_prob,
2277 /*amb_length_donor*/0,/*amb_length_acceptor*/0,
2278 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
2279 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
2280 /*amb_nmismatches_donor*/NULL,/*amb_nmismatches_acceptor*/NULL,
2281 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
2282 /*copy_donor_p*/false,/*copy_acceptor_p*/false,/*copy_shortexon_p*/false,
2283 splicing_penalty,querylength,sensedir,sarrayp));
2284 } else {
2285 if (donor != NULL) Substring_free(&donor);
2286 if (shortexon != NULL) Substring_free(&shortexon);
2287 if (acceptor != NULL) Substring_free(&acceptor);
2288 }
2289 }
2290 }
2291 }
2292 }
2293
2294 return hits;
2295 }
2296 #endif
22971788
22981789
22991790 static int
23771868 Substring_genomicstart(Stage3end_substring_donor(hit)),Substring_genomicend(Stage3end_substring_donor(hit)),
23781869 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
23791870 Substring_genomicstart(Stage3end_substring_acceptor(hit)),Substring_genomicend(Stage3end_substring_acceptor(hit)),
2380 Stage3end_nmismatches_whole(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
2381 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
1871 Stage3end_nmismatches_whole(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
1872 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
23821873 if ((nmismatches = Stage3end_nmismatches_whole(hit)) < best_nmismatches) {
23831874 best_nmismatches = nmismatches;
23841875 }
23941885 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP &&
23951886 Stage3end_chimera_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP) {
23961887 debug7(printf("accepting distance %d, probabilities %f and %f\n",
2397 Stage3end_distance(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
2398 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
1888 Stage3end_distance(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
1889 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
23991890 n_good_spliceends += 1;
24001891 accepted_hits = List_push(accepted_hits,(void *) hit);
24011892 } else {
24111902 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP ||
24121903 Stage3end_chimera_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP) {
24131904 debug7(printf("accepting distance %d, probabilities %f and %f\n",
2414 Stage3end_distance(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
2415 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
1905 Stage3end_distance(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
1906 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
24161907 n_good_spliceends += 1;
24171908 accepted_hits = List_push(accepted_hits,(void *) hit);
24181909 } else {
24821973 for (kk = ii; kk < jj; kk++) {
24831974 acceptor = Stage3end_substring_acceptor(subarray[kk]);
24841975 #ifdef LARGE_GENOMES
2485 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(acceptor));
1976 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_A(acceptor));
24861977 #else
2487 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(acceptor));
1978 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_A(acceptor));
24881979 #endif
24891980 amb_knowni = Intlist_push(amb_knowni,-1);
24901981 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(acceptor));
2491 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(acceptor));
1982 amb_probs = Doublelist_push(amb_probs,Substring_siteA_prob(acceptor));
24921983 }
24931984
24941985 nmismatches_acceptor = best_nmismatches - Substring_nmismatches_whole(donor);
25502041 for (kk = ii; kk < jj; kk++) {
25512042 donor = Stage3end_substring_donor(subarray[kk]);
25522043 #ifdef LARGE_GENOMES
2553 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(donor));
2044 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_D(donor));
25542045 #else
2555 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(donor));
2046 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_D(donor));
25562047 #endif
25572048 amb_knowni = Intlist_push(amb_knowni,-1);
25582049 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(donor));
2559 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(donor));
2050 amb_probs = Doublelist_push(amb_probs,Substring_siteD_prob(donor));
25602051 }
25612052
25622053 nmismatches_donor = best_nmismatches - Substring_nmismatches_whole(acceptor);
27052196 Substring_genomicstart(Stage3end_substring_donor(hit)),Substring_genomicend(Stage3end_substring_donor(hit)),
27062197 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
27072198 Substring_genomicstart(Stage3end_substring_acceptor(hit)),Substring_genomicend(Stage3end_substring_acceptor(hit)),
2708 Stage3end_nmismatches_whole(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
2709 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
2199 Stage3end_nmismatches_whole(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
2200 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
27102201 if ((nmismatches = Stage3end_nmismatches_whole(hit)) < best_nmismatches) {
27112202 best_nmismatches = nmismatches;
27122203 }
27222213 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP &&
27232214 Stage3end_chimera_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP) {
27242215 debug7(printf("accepting distance %d, probabilities %f and %f\n",
2725 Stage3end_distance(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
2726 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
2216 Stage3end_distance(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
2217 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
27272218 n_good_spliceends += 1;
27282219 accepted_hits = List_push(accepted_hits,(void *) hit);
27292220 } else {
27392230 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP ||
27402231 Stage3end_chimera_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP) {
27412232 debug7(printf("accepting distance %d, probabilities %f and %f\n",
2742 Stage3end_distance(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
2743 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
2233 Stage3end_distance(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
2234 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
27442235 n_good_spliceends += 1;
27452236 accepted_hits = List_push(accepted_hits,(void *) hit);
27462237 } else {
28132304 for (kk = ii; kk < jj; kk++) {
28142305 acceptor = Stage3end_substring_acceptor(subarray[kk]);
28152306 #ifdef LARGE_GENOMES
2816 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(acceptor));
2307 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_A(acceptor));
28172308 #else
2818 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(acceptor));
2309 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_A(acceptor));
28192310 #endif
28202311 amb_knowni = Intlist_push(amb_knowni,-1);
28212312 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(acceptor));
2822 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(acceptor));
2313 amb_probs = Doublelist_push(amb_probs,Substring_siteA_prob(acceptor));
28232314 }
28242315
28252316 nmismatches_acceptor = best_nmismatches - Substring_nmismatches_whole(donor);
28812372 for (kk = ii; kk < jj; kk++) {
28822373 donor = Stage3end_substring_donor(subarray[kk]);
28832374 #ifdef LARGE_GENOMES
2884 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(donor));
2375 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_D(donor));
28852376 #else
2886 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(donor));
2377 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_D(donor));
28872378 #endif
28882379 amb_knowni = Intlist_push(amb_knowni,-1);
28892380 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(donor));
2890 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(donor));
2381 amb_probs = Doublelist_push(amb_probs,Substring_siteD_prob(donor));
28912382 }
28922383
28932384 nmismatches_donor = best_nmismatches - Substring_nmismatches_whole(acceptor);
0 static char rcsid[] = "$Id: stage1hr.c 195972 2016-08-08 17:11:50Z twu $";
0 static char rcsid[] = "$Id: stage1hr.c 196433 2016-08-16 20:20:51Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
8888 #define MAX_INDEXSIZE 8
8989 #endif
9090
91 /* Note: MAX_READLENGTH is defined externally by configure */
92 #ifndef MAX_READLENGTH
93 #error A default value for MAX_READLENGTH was not provided to configure
94 #endif
95
9691
9792 /* MAX_NALIGNMENTS of 2 vs 1 gets 1600 improvements in 275,000 reads */
9893 /* MAX_NALIGNMENTS of 3 vs 2 gets 96 improvements in 275,000 reads */
163158 static int max_gmap_pairsearch;
164159 static int max_gmap_segments; /* Not used */
165160 static int max_gmap_improvement;
161
162 static int max_floors_readlength;
166163
167164
168165 #define A_CHAR 0x0
45604557 ptr->leftmost = ptr->rightmost = -1;
45614558 ptr->left_splice_p = ptr->right_splice_p = false;
45624559 ptr->spliceable_low_p = last_spliceable_p;
4560 /* ptr->spliceable_high_p = false; */
45634561 #if 0
45644562 ptr->leftspan = ptr->rightspan = -1;
45654563 #endif
45734571 so if segmenti->querypos3 is too high, then it is not spliceable */
45744572 if (last_querypos > query_lastpos) {
45754573 /* Not spliceable */
4574 last_spliceable_p = false;
45764575 } else if (diagonal <= last_diagonal + max_distance) {
45774576 *ptr_spliceable++ = ptr;
45784577 ptr->spliceable_high_p = last_spliceable_p = true;
45824581 so if segmenti->querypos5 is too low, then it is not spliceable */
45834582 if (first_querypos < index1part) {
45844583 /* Not spliceable */
4584 last_spliceable_p = false;
45854585 } else if (diagonal <= last_diagonal + max_distance) {
45864586 *ptr_spliceable++ = ptr;
45874587 ptr->spliceable_high_p = last_spliceable_p = true;
48444844 ptr->leftmost = ptr->rightmost = -1;
48454845 ptr->left_splice_p = ptr->right_splice_p = false;
48464846 ptr->spliceable_low_p = last_spliceable_p;
4847 ptr->spliceable_high_p = false;
48474848 #if 0
48484849 ptr->leftspan = ptr->rightspan = -1;
48494850 #endif
64486449 int sum, best_sum = querylength;
64496450 int conti, shifti;
64506451 int best_indel_pos = -1, endlength;
6451
6452 #ifdef HAVE_ALLOCA
6453 int *mismatch_positions_shift = (int *) ALLOCA((querylength+1)*sizeof(int));
6454 #else
6455 int mismatch_positions_shift[MAX_READLENGTH+1];
6456 #endif
6457
64586452 #ifdef OLD_END_INDELS
64596453 int indel_pos;
64606454 #else
64616455 int indel_pos_cont, indel_pos_shift;
64626456 #endif
6457 int *mismatch_positions_shift;
6458
6459
6460 #ifdef HAVE_ALLOCA
6461 if (querylength <= MAX_STACK_READLENGTH) {
6462 mismatch_positions_shift = (int *) ALLOCA((querylength+1)*sizeof(int));
6463 } else {
6464 mismatch_positions_shift = (int *) MALLOC((querylength+1)*sizeof(int));
6465 }
6466 #else
6467 mismatch_positions_shift = (int *) MALLOC((querylength+1)*sizeof(int));
6468 #endif
6469
64636470
64646471 debug2e(printf("Entered compute_end_indels_right with breakpoint = %d, max_mismatches_short %d\n",
64656472 breakpoint,max_mismatches_short));
66126619 }
66136620 }
66146621 }
6615 shifti--;
6616 indel_pos_shift = mismatch_positions_shift[shifti] + 1;
6622 if (--shifti >= 0) {
6623 indel_pos_shift = mismatch_positions_shift[shifti] + 1;
6624 }
66176625
66186626 } else {
66196627 sum = shifti + conti;
66356643 }
66366644 }
66376645 conti++;
6638 shifti--;
6639 indel_pos_cont = mismatch_positions_long[conti];
6640 indel_pos_shift = mismatch_positions_shift[shifti] + 1;
6646 if (--shifti >= 0) {
6647 indel_pos_cont = mismatch_positions_long[conti];
6648 indel_pos_shift = mismatch_positions_shift[shifti] + 1;
6649 }
66416650 }
66426651 }
66436652
68166825 }
68176826 }
68186827 }
6819 shifti--;
6820 indel_pos_shift = mismatch_positions_shift[shifti] - sep + 1;
6828 if (--shifti >= 0) {
6829 indel_pos_shift = mismatch_positions_shift[shifti] - sep + 1;
6830 }
68216831
68226832 } else {
68236833 sum = shifti + conti;
68396849 }
68406850 }
68416851 conti++;
6842 shifti--;
6843 indel_pos_cont = mismatch_positions_long[conti];
6844 indel_pos_shift = mismatch_positions_shift[shifti] - sep + 1;
6852 if (--shifti >= 0) {
6853 indel_pos_cont = mismatch_positions_long[conti];
6854 indel_pos_shift = mismatch_positions_shift[shifti] - sep + 1;
6855 }
68456856 }
68466857 }
68476858
68716882 }
68726883 }
68736884
6885 #ifdef HAVE_ALLOCA
6886 if (querylength <= MAX_STACK_READLENGTH) {
6887 FREEA(mismatch_positions_shift);
6888 } else {
6889 FREE(mismatch_positions_shift);
6890 }
6891 #else
6892 FREE(mismatch_positions_shift);
6893 #endif
6894
68746895 debug2e(printf("compute_end_indels_right returning with nmismatches_longcont %d + nmismatches_shift %d for %d indels at indel_pos %d\n",
68756896 *nmismatches_longcont,*nmismatches_shift,*indels,best_indel_pos));
68766897
68956916 int sum, best_sum = querylength;
68966917 int conti, shifti;
68976918 int best_indel_pos = -1;
6898
6899 #ifdef HAVE_ALLOCA
6900 int *mismatch_positions_shift = (int *) ALLOCA((querylength+1)*sizeof(int));
6901 #else
6902 int mismatch_positions_shift[MAX_READLENGTH+1];
6903 #endif
6904
69056919 #ifdef OLD_END_INDELS
69066920 int indel_pos;
69076921 #else
69086922 int indel_pos_cont, indel_pos_shift;
6923 #endif
6924 int *mismatch_positions_shift;
6925
6926 #ifdef HAVE_ALLOCA
6927 if (querylength <= MAX_STACK_READLENGTH) {
6928 mismatch_positions_shift = (int *) ALLOCA((querylength+1)*sizeof(int));
6929 } else {
6930 mismatch_positions_shift = (int *) MALLOC((querylength+1)*sizeof(int));
6931 }
6932 #else
6933 mismatch_positions_shift = (int *) MALLOC((querylength+1)*sizeof(int));
69096934 #endif
69106935
69116936
70587083 }
70597084 }
70607085 }
7061 shifti--;
7062 indel_pos_shift = mismatch_positions_shift[shifti];
7086 if (--shifti >= 0) {
7087 indel_pos_shift = mismatch_positions_shift[shifti];
7088 }
70637089
70647090 } else {
70657091 sum = shifti + conti;
70807106 }
70817107 }
70827108 conti++;
7083 shifti--;
7084 indel_pos_cont = mismatch_positions_long[conti] - sep + 1;
7085 indel_pos_shift = mismatch_positions_shift[shifti];
7086
7109 if (--shifti >= 0) {
7110 indel_pos_cont = mismatch_positions_long[conti] - sep + 1;
7111 indel_pos_shift = mismatch_positions_shift[shifti];
7112 }
70877113 }
70887114 }
70897115
72597285 }
72607286 }
72617287 }
7262 shifti--;
7263 indel_pos_shift = mismatch_positions_shift[shifti];
7288 if (--shifti >= 0) {
7289 indel_pos_shift = mismatch_positions_shift[shifti];
7290 }
72647291
72657292 } else {
72667293 sum = shifti + conti;
72817308 }
72827309 }
72837310 conti++;
7284 shifti--;
7285 indel_pos_cont = mismatch_positions_long[conti] + 1;
7286 indel_pos_shift = mismatch_positions_shift[shifti];
7311 if (--shifti >= 0) {
7312 indel_pos_cont = mismatch_positions_long[conti] + 1;
7313 indel_pos_shift = mismatch_positions_shift[shifti];
7314 }
72877315 }
72887316 }
72897317
73127340 }
73137341 }
73147342
7343 #ifdef HAVE_ALLOCA
7344 if (querylength <= MAX_STACK_READLENGTH) {
7345 FREEA(mismatch_positions_shift);
7346 } else {
7347 FREE(mismatch_positions_shift);
7348 }
7349 #else
7350 FREE(mismatch_positions_shift);
7351 #endif
73157352
73167353 debug2e(printf("compute_end_indels_left returning with nmismatches_cont %d + nmismatches_shift %d for %d indels at indel_pos %d\n",
73177354 *nmismatches_longcont,*nmismatches_shift,*indels,best_indel_pos));
73427379 int indels, query_indel_pos, indel_pos, breakpoint;
73437380 int nmismatches, nmismatches_long, nmismatches_longcont, nmismatches_shift;
73447381 int nmismatches1, nmismatches2;
7382 int *mismatch_positions;
7383
73457384
73467385 #ifdef HAVE_ALLOCA
7347 int *mismatch_positions = (int *) ALLOCA(querylength*sizeof(int));
7348 #else
7349 int mismatch_positions[MAX_READLENGTH];
7386 if (querylength <= MAX_STACK_READLENGTH) {
7387 mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
7388 } else {
7389 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
7390 }
7391 #else
7392 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
73507393 #endif
73517394
73527395
74847527 }
74857528 }
74867529
7530 #ifdef HAVE_ALLOCA
7531 if (querylength <= MAX_STACK_READLENGTH) {
7532 FREEA(mismatch_positions);
7533 } else {
7534 FREE(mismatch_positions);
7535 }
7536 #else
7537 FREE(mismatch_positions);
7538 #endif
7539
74877540 return hits;
74887541 }
74897542
75087561 int indels, query_indel_pos, indel_pos, breakpoint;
75097562 int nmismatches, nmismatches_long, nmismatches_longcont, nmismatches_shift;
75107563 int nmismatches1, nmismatches2;
7564 int *mismatch_positions;
7565
75117566
75127567 #ifdef HAVE_ALLOCA
7513 int *mismatch_positions = (int *) ALLOCA(querylength*sizeof(int));
7514 #else
7515 int mismatch_positions[MAX_READLENGTH];
7568 if (querylength <= MAX_STACK_READLENGTH) {
7569 mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
7570 } else {
7571 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
7572 }
7573 #else
7574 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
75167575 #endif
75177576
75187577
76497708 }
76507709 }
76517710 }
7711
7712 #ifdef HAVE_ALLOCA
7713 if (querylength <= MAX_STACK_READLENGTH) {
7714 FREEA(mismatch_positions);
7715 } else {
7716 FREE(mismatch_positions);
7717 }
7718 #else
7719 FREE(mismatch_positions);
7720 #endif
76527721
76537722 return hits;
76547723 }
77927861
77937862
77947863
7795 #if 0
7796 static void
7797 find_segmentm_span (Segment_T segmentm, int max_mismatches_allowed,
7798 int querylength, Compress_T query_compress,
7799 Univcoord_T left, bool plusp, int genestrand, bool first_read_p) {
7800 int nmismatches, i;
7801 int leftspan, rightspan, bestspan;
7802 #ifdef HAVE_ALLOCA
7803 int *mismatch_positions = (int *) ALLOCA(querylength*sizeof(int));
7804 #else
7805 int mismatch_positions[MAX_READLENGTH];
7806 #endif
7807
7808 /* Find all mismatches */
7809 nmismatches = Genome_mismatches_left(mismatch_positions,/*max_mismatches*/querylength,
7810 query_compress,left,/*pos5*/0,/*pos3*/querylength,
7811 plusp,genestrand,first_read_p);
7812
7813 if (nmismatches < max_mismatches_allowed) {
7814 segmentm->leftspan = 0;
7815 segmentm->rightspan = querylength;
7816 } else {
7817 segmentm->leftspan = 0;
7818 bestspan = segmentm->rightspan = mismatch_positions[max_mismatches_allowed] + /*slop*/ 1;
7819 for (i = 0; i < nmismatches - max_mismatches_allowed; i++) {
7820 leftspan = mismatch_positions[i];
7821 rightspan = mismatch_positions[i + max_mismatches_allowed + 1] + /*slop*/ 1;
7822 if (rightspan - leftspan > bestspan) {
7823 segmentm->leftspan = leftspan;
7824 segmentm->rightspan = rightspan;
7825 bestspan = rightspan - leftspan;
7826 } else if (rightspan - leftspan == bestspan) {
7827 segmentm->rightspan = rightspan;
7828 }
7829 }
7830 }
7831 return;
7832 }
7833 #endif
7834
7835
78367864 /* Copied from sarray-read.c */
78377865 static int
78387866 donor_match_length_cmp (const void *a, const void *b) {
78837911 int nmismatches_left, nmismatches_right;
78847912 int segmenti_donor_nknown, segmentj_acceptor_nknown,
78857913 segmentj_antidonor_nknown, segmenti_antiacceptor_nknown;
7914 int *mismatch_positions_left, *mismatch_positions_right;
7915 int *segmenti_donor_knownpos, *segmentj_acceptor_knownpos, *segmentj_antidonor_knownpos, *segmenti_antiacceptor_knownpos,
7916 *segmenti_donor_knowni, *segmentj_acceptor_knowni,
7917 *segmentj_antidonor_knowni, *segmenti_antiacceptor_knowni;
78867918
78877919 #ifdef HAVE_ALLOCA
7888 int *mismatch_positions_left = (int *) ALLOCA(querylength*sizeof(int));
7889 int *mismatch_positions_right = (int *) ALLOCA(querylength*sizeof(int));
7890 int *segmenti_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7891 int *segmentj_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7892 int *segmentj_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7893 int *segmenti_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7894 int *segmenti_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7895 int *segmentj_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7896 int *segmentj_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7897 int *segmenti_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7898 #else
7899 int mismatch_positions_left[MAX_READLENGTH], mismatch_positions_right[MAX_READLENGTH];
7900 int segmenti_donor_knownpos[MAX_READLENGTH+1], segmentj_acceptor_knownpos[MAX_READLENGTH+1],
7901 segmentj_antidonor_knownpos[MAX_READLENGTH+1], segmenti_antiacceptor_knownpos[MAX_READLENGTH+1];
7902 int segmenti_donor_knowni[MAX_READLENGTH+1], segmentj_acceptor_knowni[MAX_READLENGTH+1],
7903 segmentj_antidonor_knowni[MAX_READLENGTH+1], segmenti_antiacceptor_knowni[MAX_READLENGTH+1];
7920 if (querylength <= MAX_STACK_READLENGTH) {
7921 mismatch_positions_left = (int *) ALLOCA(querylength*sizeof(int));
7922 mismatch_positions_right = (int *) ALLOCA(querylength*sizeof(int));
7923 segmenti_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7924 segmentj_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7925 segmentj_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7926 segmenti_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
7927 segmenti_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7928 segmentj_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7929 segmentj_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7930 segmenti_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
7931 } else {
7932 mismatch_positions_left = (int *) MALLOC(querylength*sizeof(int));
7933 mismatch_positions_right = (int *) MALLOC(querylength*sizeof(int));
7934 segmenti_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7935 segmentj_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7936 segmentj_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7937 segmenti_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7938 segmenti_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7939 segmentj_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7940 segmentj_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7941 segmenti_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7942 }
7943 #else
7944 mismatch_positions_left = (int *) MALLOC(querylength*sizeof(int));
7945 mismatch_positions_right = (int *) MALLOC(querylength*sizeof(int));
7946 segmenti_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7947 segmentj_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7948 segmentj_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7949 segmenti_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
7950 segmenti_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7951 segmentj_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7952 segmentj_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
7953 segmenti_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
79047954 #endif
79057955
79067956 Chrpos_T max_distance;
81258175 Substring_genomicstart(Stage3end_substring_donor(hit)),Substring_genomicend(Stage3end_substring_donor(hit)),
81268176 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
81278177 Substring_genomicstart(Stage3end_substring_acceptor(hit)),Substring_genomicend(Stage3end_substring_acceptor(hit)),
8128 Stage3end_nmismatches_whole(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
8129 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
8178 Stage3end_nmismatches_whole(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
8179 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
81308180 if ((nmismatches = Stage3end_nmismatches_whole(hit)) < best_nmismatches) {
81318181 best_nmismatches = nmismatches;
81328182 }
81428192 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP &&
81438193 Stage3end_chimera_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP) {
81448194 debug7(printf("accepting distance %d, probabilities %f and %f\n",
8145 Stage3end_distance(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
8146 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
8195 Stage3end_distance(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
8196 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
81478197 n_good_spliceends += 1;
81488198 accepted_hits = List_push(accepted_hits,(void *) hit);
81498199 } else {
81598209 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP ||
81608210 Stage3end_chimera_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP) {
81618211 debug7(printf("accepting distance %d, probabilities %f and %f\n",
8162 Stage3end_distance(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
8163 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
8212 Stage3end_distance(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
8213 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
81648214 n_good_spliceends += 1;
81658215 accepted_hits = List_push(accepted_hits,(void *) hit);
81668216 } else {
82278277 for (k = i; k < j; k++) {
82288278 acceptor = Stage3end_substring_acceptor(hitarray[k]);
82298279 #ifdef LARGE_GENOMES
8230 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(acceptor));
8231 #else
8232 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(acceptor));
8280 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_A(acceptor));
8281 #else
8282 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_A(acceptor));
82338283 #endif
82348284 amb_knowni = Intlist_push(amb_knowni,-1);
82358285 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(acceptor));
8236 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(acceptor));
8286 amb_probs = Doublelist_push(amb_probs,Substring_siteA_prob(acceptor));
82378287 }
82388288
82398289 nmismatches_acceptor = best_nmismatches - Substring_nmismatches_whole(donor);
82988348 for (k = i; k < j; k++) {
82998349 donor = Stage3end_substring_donor(hitarray[k]);
83008350 #ifdef LARGE_GENOMES
8301 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(donor));
8302 #else
8303 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(donor));
8351 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_D(donor));
8352 #else
8353 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_D(donor));
83048354 #endif
83058355 amb_knowni = Intlist_push(amb_knowni,-1);
83068356 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(donor));
8307 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(donor));
8357 amb_probs = Doublelist_push(amb_probs,Substring_siteD_prob(donor));
83088358 }
83098359
83108360 nmismatches_donor = best_nmismatches - Substring_nmismatches_whole(acceptor);
83588408 Substring_genomicstart(Stage3end_substring_donor(hit)),Substring_genomicend(Stage3end_substring_donor(hit)),
83598409 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
83608410 Substring_genomicstart(Stage3end_substring_acceptor(hit)),Substring_genomicend(Stage3end_substring_acceptor(hit)),
8361 Stage3end_nmismatches_whole(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
8362 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
8411 Stage3end_nmismatches_whole(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
8412 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
83638413 if ((nmismatches = Stage3end_nmismatches_whole(hit)) < best_nmismatches) {
83648414 best_nmismatches = nmismatches;
83658415 }
83778427 debug7(printf("accepting distance %d, donor length %d and acceptor length %d, probabilities %f and %f\n",
83788428 Stage3end_distance(hit),Substring_match_length_orig(Stage3end_substring_donor(hit)),
83798429 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
8380 Substring_chimera_prob(Stage3end_substring_donor(hit)),
8381 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
8430 Substring_siteD_prob(Stage3end_substring_donor(hit)),
8431 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
83828432 n_good_spliceends += 1;
83838433 accepted_hits = List_push(accepted_hits,(void *) hit);
83848434 } else {
83968446 debug7(printf("accepting distance %d, donor length %d and acceptor length %d, probabilities %f and %f\n",
83978447 Stage3end_distance(hit),Substring_match_length_orig(Stage3end_substring_donor(hit)),
83988448 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
8399 Substring_chimera_prob(Stage3end_substring_donor(hit)),
8400 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
8449 Substring_siteD_prob(Stage3end_substring_donor(hit)),
8450 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
84018451 n_good_spliceends += 1;
84028452 accepted_hits = List_push(accepted_hits,(void *) hit);
84038453 } else {
84648514 for (k = i; k < j; k++) {
84658515 acceptor = Stage3end_substring_acceptor(hitarray[k]);
84668516 #ifdef LARGE_GENOMES
8467 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(acceptor));
8468 #else
8469 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(acceptor));
8517 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_A(acceptor));
8518 #else
8519 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_A(acceptor));
84708520 #endif
84718521 amb_knowni = Intlist_push(amb_knowni,-1);
84728522 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(acceptor));
8473 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(acceptor));
8523 amb_probs = Doublelist_push(amb_probs,Substring_siteA_prob(acceptor));
84748524 }
84758525
84768526 nmismatches_acceptor = best_nmismatches - Substring_nmismatches_whole(donor);
85358585 for (k = i; k < j; k++) {
85368586 donor = Stage3end_substring_donor(hitarray[k]);
85378587 #ifdef LARGE_GENOMES
8538 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(donor));
8539 #else
8540 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(donor));
8588 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_D(donor));
8589 #else
8590 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_D(donor));
85418591 #endif
85428592 amb_knowni = Intlist_push(amb_knowni,-1);
85438593 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(donor));
8544 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(donor));
8594 amb_probs = Doublelist_push(amb_probs,Substring_siteD_prob(donor));
85458595 }
85468596
85478597 nmismatches_donor = best_nmismatches - Substring_nmismatches_whole(acceptor);
85878637 }
85888638 }
85898639
8640 #ifdef HAVE_ALLOCA
8641 if (querylength <= MAX_STACK_READLENGTH) {
8642 FREEA(mismatch_positions_left);
8643 FREEA(mismatch_positions_right);
8644 FREEA(segmenti_donor_knownpos);
8645 FREEA(segmentj_acceptor_knownpos);
8646 FREEA(segmentj_antidonor_knownpos);
8647 FREEA(segmenti_antiacceptor_knownpos);
8648 FREEA(segmenti_donor_knowni);
8649 FREEA(segmentj_acceptor_knowni);
8650 FREEA(segmentj_antidonor_knowni);
8651 FREEA(segmenti_antiacceptor_knowni);
8652 } else {
8653 FREE(mismatch_positions_left);
8654 FREE(mismatch_positions_right);
8655 FREE(segmenti_donor_knownpos);
8656 FREE(segmentj_acceptor_knownpos);
8657 FREE(segmentj_antidonor_knownpos);
8658 FREE(segmenti_antiacceptor_knownpos);
8659 FREE(segmenti_donor_knowni);
8660 FREE(segmentj_acceptor_knowni);
8661 FREE(segmentj_antidonor_knowni);
8662 FREE(segmenti_antiacceptor_knowni);
8663 }
8664 #else
8665 FREE(mismatch_positions_left);
8666 FREE(mismatch_positions_right);
8667 FREE(segmenti_donor_knownpos);
8668 FREE(segmentj_acceptor_knownpos);
8669 FREE(segmentj_antidonor_knownpos);
8670 FREE(segmenti_antiacceptor_knownpos);
8671 FREE(segmenti_donor_knowni);
8672 FREE(segmentj_acceptor_knowni);
8673 FREE(segmentj_antidonor_knowni);
8674 FREE(segmenti_antiacceptor_knowni);
8675 #endif
8676
85908677 debug(printf("Finished find_singlesplices_plus with %d hits and %d lowprob\n",
85918678 List_length(hits),List_length(*lowprob)));
85928679
86068693 int nmismatches_left, nmismatches_right;
86078694 int segmenti_donor_nknown, segmentj_acceptor_nknown,
86088695 segmentj_antidonor_nknown, segmenti_antiacceptor_nknown;
8609
8696 int *mismatch_positions_left, *mismatch_positions_right;
8697 int *segmenti_donor_knownpos, *segmentj_acceptor_knownpos, *segmentj_antidonor_knownpos, *segmenti_antiacceptor_knownpos,
8698 *segmenti_donor_knowni, *segmentj_acceptor_knowni,
8699 *segmentj_antidonor_knowni, *segmenti_antiacceptor_knowni;
8700
86108701 #ifdef HAVE_ALLOCA
8611 int *mismatch_positions_left = (int *) ALLOCA(querylength*sizeof(int));
8612 int *mismatch_positions_right = (int *) ALLOCA(querylength*sizeof(int));
8613 int *segmenti_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
8614 int *segmentj_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
8615 int *segmentj_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
8616 int *segmenti_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
8617 int *segmenti_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
8618 int *segmentj_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
8619 int *segmentj_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
8620 int *segmenti_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
8621 #else
8622 int mismatch_positions_left[MAX_READLENGTH], mismatch_positions_right[MAX_READLENGTH];
8623 int segmenti_donor_knownpos[MAX_READLENGTH+1], segmentj_acceptor_knownpos[MAX_READLENGTH+1],
8624 segmentj_antidonor_knownpos[MAX_READLENGTH+1], segmenti_antiacceptor_knownpos[MAX_READLENGTH+1];
8625 int segmenti_donor_knowni[MAX_READLENGTH+1], segmentj_acceptor_knowni[MAX_READLENGTH+1],
8626 segmentj_antidonor_knowni[MAX_READLENGTH+1], segmenti_antiacceptor_knowni[MAX_READLENGTH+1];
8702 if (querylength <= MAX_STACK_READLENGTH) {
8703 mismatch_positions_left = (int *) ALLOCA(querylength*sizeof(int));
8704 mismatch_positions_right = (int *) ALLOCA(querylength*sizeof(int));
8705 segmenti_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
8706 segmentj_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
8707 segmentj_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
8708 segmenti_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
8709 segmenti_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
8710 segmentj_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
8711 segmentj_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
8712 segmenti_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
8713 } else {
8714 mismatch_positions_left = (int *) MALLOC(querylength*sizeof(int));
8715 mismatch_positions_right = (int *) MALLOC(querylength*sizeof(int));
8716 segmenti_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
8717 segmentj_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
8718 segmentj_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
8719 segmenti_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
8720 segmenti_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
8721 segmentj_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
8722 segmentj_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
8723 segmenti_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
8724 }
8725 #else
8726 mismatch_positions_left = (int *) MALLOC(querylength*sizeof(int));
8727 mismatch_positions_right = (int *) MALLOC(querylength*sizeof(int));
8728 segmenti_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
8729 segmentj_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
8730 segmentj_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
8731 segmenti_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
8732 segmenti_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
8733 segmentj_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
8734 segmentj_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
8735 segmenti_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
86278736 #endif
86288737
86298738 Chrpos_T max_distance;
88478956 Substring_genomicstart(Stage3end_substring_donor(hit)),Substring_genomicend(Stage3end_substring_donor(hit)),
88488957 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
88498958 Substring_genomicstart(Stage3end_substring_acceptor(hit)),Substring_genomicend(Stage3end_substring_acceptor(hit)),
8850 Stage3end_nmismatches_whole(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
8851 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
8959 Stage3end_nmismatches_whole(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
8960 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
88528961 if ((nmismatches = Stage3end_nmismatches_whole(hit)) < best_nmismatches) {
88538962 best_nmismatches = nmismatches;
88548963 }
88648973 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP &&
88658974 Stage3end_chimera_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP) {
88668975 debug7(printf("accepting distance %d, probabilities %f and %f\n",
8867 Stage3end_distance(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
8868 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
8976 Stage3end_distance(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
8977 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
88698978 n_good_spliceends += 1;
88708979 accepted_hits = List_push(accepted_hits,(void *) hit);
88718980 } else {
88818990 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP ||
88828991 Stage3end_chimera_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP) {
88838992 debug7(printf("accepting distance %d, probabilities %f and %f\n",
8884 Stage3end_distance(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
8885 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
8993 Stage3end_distance(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
8994 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
88868995 n_good_spliceends += 1;
88878996 accepted_hits = List_push(accepted_hits,(void *) hit);
88888997 } else {
89499058 for (k = i; k < j; k++) {
89509059 acceptor = Stage3end_substring_acceptor(hitarray[k]);
89519060 #ifdef LARGE_GENOMES
8952 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(acceptor));
8953 #else
8954 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(acceptor));
9061 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_A(acceptor));
9062 #else
9063 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_A(acceptor));
89559064 #endif
89569065 amb_knowni = Intlist_push(amb_knowni,-1);
89579066 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(acceptor));
8958 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(acceptor));
9067 amb_probs = Doublelist_push(amb_probs,Substring_siteA_prob(acceptor));
89599068 }
89609069
89619070 nmismatches_acceptor = best_nmismatches - Substring_nmismatches_whole(donor);
90209129 for (k = i; k < j; k++) {
90219130 donor = Stage3end_substring_donor(hitarray[k]);
90229131 #ifdef LARGE_GENOMES
9023 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(donor));
9024 #else
9025 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(donor));
9132 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_D(donor));
9133 #else
9134 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_D(donor));
90269135 #endif
90279136 amb_knowni = Intlist_push(amb_knowni,-1);
90289137 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(donor));
9029 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(donor));
9138 amb_probs = Doublelist_push(amb_probs,Substring_siteD_prob(donor));
90309139 }
90319140
90329141 nmismatches_donor = best_nmismatches - Substring_nmismatches_whole(acceptor);
90809189 Substring_genomicstart(Stage3end_substring_donor(hit)),Substring_genomicend(Stage3end_substring_donor(hit)),
90819190 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
90829191 Substring_genomicstart(Stage3end_substring_acceptor(hit)),Substring_genomicend(Stage3end_substring_acceptor(hit)),
9083 Stage3end_nmismatches_whole(hit),Substring_chimera_prob(Stage3end_substring_donor(hit)),
9084 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
9192 Stage3end_nmismatches_whole(hit),Substring_siteD_prob(Stage3end_substring_donor(hit)),
9193 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
90859194 if ((nmismatches = Stage3end_nmismatches_whole(hit)) < best_nmismatches) {
90869195 best_nmismatches = nmismatches;
90879196 }
90999208 debug7(printf("accepting distance %d, donor length %d and acceptor length %d, probabilities %f and %f\n",
91009209 Stage3end_distance(hit),Substring_match_length_orig(Stage3end_substring_donor(hit)),
91019210 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
9102 Substring_chimera_prob(Stage3end_substring_donor(hit)),
9103 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
9211 Substring_siteD_prob(Stage3end_substring_donor(hit)),
9212 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
91049213 n_good_spliceends += 1;
91059214 accepted_hits = List_push(accepted_hits,(void *) hit);
91069215 } else {
91189227 debug7(printf("accepting distance %d, donor length %d and acceptor length %d, probabilities %f and %f\n",
91199228 Stage3end_distance(hit),Substring_match_length_orig(Stage3end_substring_donor(hit)),
91209229 Substring_match_length_orig(Stage3end_substring_acceptor(hit)),
9121 Substring_chimera_prob(Stage3end_substring_donor(hit)),
9122 Substring_chimera_prob(Stage3end_substring_acceptor(hit))));
9230 Substring_siteD_prob(Stage3end_substring_donor(hit)),
9231 Substring_siteA_prob(Stage3end_substring_acceptor(hit))));
91239232 n_good_spliceends += 1;
91249233 accepted_hits = List_push(accepted_hits,(void *) hit);
91259234 } else {
91869295 for (k = i; k < j; k++) {
91879296 acceptor = Stage3end_substring_acceptor(hitarray[k]);
91889297 #ifdef LARGE_GENOMES
9189 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(acceptor));
9190 #else
9191 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(acceptor));
9298 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_A(acceptor));
9299 #else
9300 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_A(acceptor));
91929301 #endif
91939302 amb_knowni = Intlist_push(amb_knowni,-1);
91949303 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(acceptor));
9195 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(acceptor));
9304 amb_probs = Doublelist_push(amb_probs,Substring_siteA_prob(acceptor));
91969305 }
91979306
91989307 nmismatches_acceptor = best_nmismatches - Substring_nmismatches_whole(donor);
92569365 for (k = i; k < j; k++) {
92579366 donor = Stage3end_substring_donor(hitarray[k]);
92589367 #ifdef LARGE_GENOMES
9259 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord(donor));
9260 #else
9261 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord(donor));
9368 ambcoords = Uint8list_push(ambcoords,Substring_splicecoord_D(donor));
9369 #else
9370 ambcoords = Uintlist_push(ambcoords,Substring_splicecoord_D(donor));
92629371 #endif
92639372 amb_knowni = Intlist_push(amb_knowni,-1);
92649373 amb_nmismatches = Intlist_push(amb_nmismatches,Substring_nmismatches_whole(donor));
9265 amb_probs = Doublelist_push(amb_probs,Substring_chimera_prob(donor));
9374 amb_probs = Doublelist_push(amb_probs,Substring_siteD_prob(donor));
92669375 }
92679376
92689377 nmismatches_donor = best_nmismatches - Substring_nmismatches_whole(acceptor);
93089417 }
93099418 }
93109419
9420 #ifdef HAVE_ALLOCA
9421 if (querylength <= MAX_STACK_READLENGTH) {
9422 FREEA(mismatch_positions_left);
9423 FREEA(mismatch_positions_right);
9424 FREEA(segmenti_donor_knownpos);
9425 FREEA(segmentj_acceptor_knownpos);
9426 FREEA(segmentj_antidonor_knownpos);
9427 FREEA(segmenti_antiacceptor_knownpos);
9428 FREEA(segmenti_donor_knowni);
9429 FREEA(segmentj_acceptor_knowni);
9430 FREEA(segmentj_antidonor_knowni);
9431 FREEA(segmenti_antiacceptor_knowni);
9432 } else {
9433 FREE(mismatch_positions_left);
9434 FREE(mismatch_positions_right);
9435 FREE(segmenti_donor_knownpos);
9436 FREE(segmentj_acceptor_knownpos);
9437 FREE(segmentj_antidonor_knownpos);
9438 FREE(segmenti_antiacceptor_knownpos);
9439 FREE(segmenti_donor_knowni);
9440 FREE(segmentj_acceptor_knowni);
9441 FREE(segmentj_antidonor_knowni);
9442 FREE(segmenti_antiacceptor_knowni);
9443 }
9444 #else
9445 FREE(mismatch_positions_left);
9446 FREE(mismatch_positions_right);
9447 FREE(segmenti_donor_knownpos);
9448 FREE(segmentj_acceptor_knownpos);
9449 FREE(segmentj_antidonor_knownpos);
9450 FREE(segmenti_antiacceptor_knownpos);
9451 FREE(segmenti_donor_knowni);
9452 FREE(segmentj_acceptor_knowni);
9453 FREE(segmentj_antidonor_knowni);
9454 FREE(segmenti_antiacceptor_knowni);
9455 #endif
9456
93119457 debug(printf("Finished find_singlesplices_minus with %d hits and %d lowprob\n",
93129458 List_length(hits),List_length(*lowprob)));
93139459
94019547 #endif
94029548
94039549
9404 #if 0
9405 static List_T
9406 find_doublesplices (int *found_score, List_T hits, List_T *lowprob,
9407 Segment_T *spliceable, int nspliceable, struct Segment_T *segments,
9408 char *queryptr, int querylength, int query_lastpos, Compress_T query_compress,
9409 Chrpos_T max_distance, int splicing_penalty, int min_shortend,
9410 int max_mismatches_allowed, bool pairedp, bool first_read_p,
9411 bool plusp, int genestrand, bool subs_or_indels_p) {
9412 int j, j1, j2, joffset, jj;
9413
9414 Segment_T segmenti, segmentj, segmentm, segmenti_start, segmentj_end, *ptr;
9415 List_T potentiali, potentialj, q, r;
9416 Univcoord_T segmenti_left, segmentj_left, segmentm_left;
9417 int segmenti_donor_nknown, segmentj_acceptor_nknown,
9418 segmentj_antidonor_nknown, segmenti_antiacceptor_nknown,
9419 segmentm_donor_nknown, segmentm_acceptor_nknown,
9420 segmentm_antidonor_nknown, segmentm_antiacceptor_nknown;
9421
9422 #ifdef HAVE_ALLOCA
9423 int *segmenti_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
9424 int *segmentj_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
9425 int *segmentj_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
9426 int *segmenti_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
9427 int *segmentm_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
9428 int *segmentm_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
9429 int *segmentm_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
9430 int *segmentm_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
9431 int *segmenti_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
9432 int *segmentj_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
9433 int *segmentj_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
9434 int *segmenti_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
9435 int *segmentm_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
9436 int *segmentm_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
9437 int *segmentm_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
9438 int *segmentm_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
9439 #else
9440 int segmenti_donor_knownpos[MAX_READLENGTH+1], segmentj_acceptor_knownpos[MAX_READLENGTH+1],
9441 segmentj_antidonor_knownpos[MAX_READLENGTH+1], segmenti_antiacceptor_knownpos[MAX_READLENGTH+1],
9442 segmentm_donor_knownpos[MAX_READLENGTH+1], segmentm_acceptor_knownpos[MAX_READLENGTH+1],
9443 segmentm_antidonor_knownpos[MAX_READLENGTH+1], segmentm_antiacceptor_knownpos[MAX_READLENGTH+1];
9444 int segmenti_donor_knowni[MAX_READLENGTH+1], segmentj_acceptor_knowni[MAX_READLENGTH+1],
9445 segmentj_antidonor_knowni[MAX_READLENGTH+1], segmenti_antiacceptor_knowni[MAX_READLENGTH+1],
9446 segmentm_donor_knowni[MAX_READLENGTH+1], segmentm_acceptor_knowni[MAX_READLENGTH+1],
9447 segmentm_antidonor_knowni[MAX_READLENGTH+1], segmentm_antiacceptor_knowni[MAX_READLENGTH+1];
9448 #endif
9449
9450 #ifdef LARGE_GENOMES
9451 Uint8list_T donor_ambcoords, acceptor_ambcoords, ambcoords_donor, ambcoords_acceptor;
9452 #else
9453 Uintlist_T donor_ambcoords, acceptor_ambcoords, ambcoords_donor, ambcoords_acceptor;
9454 #endif
9455 Intlist_T splicesites_i_left, splicesites_i_right;
9456 Intlist_T nmismatches_list_left, nmismatches_list_right;
9457 bool ambp_left, ambp_right;
9458 int sensedir;
9459 /* int *floors_from_neg3, *floors_to_pos3; */
9460
9461 int nmismatches_shortexon_left, nmismatches_shortexon_middle, nmismatches_shortexon_right;
9462 int amb_length_donor, amb_length_acceptor;
9463 int best_left_j, best_right_j;
9464 bool shortexon_orig_plusp, shortexon_orig_minusp, saw_antidonor_p, saw_acceptor_p;
9465 int leftpos, rightpos;
9466 Substring_T donor, acceptor, shortexon;
9467
9468 int nhits_local /*= 0*/, npotential_left, npotential_right;
9469 int donor_length, acceptor_length;
9470 List_T accepted_hits, rejected_hits, single_ambig_hits;
9471 List_T spliceends, p;
9472 Stage3end_T hit, *hitarray;
9473 int best_nmismatches, nmismatches;
9474 int n_good_spliceends, n, i, k;
9475 double best_prob, prob;
9476 Univcoord_T lastpos;
9477 Intlist_T donor_amb_knowni, acceptor_amb_knowni, donor_amb_nmismatches, acceptor_amb_nmismatches;
9478 Doublelist_T donor_amb_probs, acceptor_amb_probs, probs_donor, probs_acceptor;
9479
9480
9481 debug(printf("*** Starting find_known_doublesplices on %d segments ***\n",nspliceable));
9482 debug(printf("Initially have %d hits\n",List_length(hits)));
9483
9484 /* floors_from_neg3 = floors->scorefrom[-index1interval]; */
9485 /* floors_to_pos3 = floors->scoreto[query_lastpos+index1interval]; */
9486
9487 for (ptr = spliceable; ptr < &(spliceable[nspliceable]); ptr++) {
9488 segmentm = *ptr;
9489 if (1 || segmentm->diagonal < (Univcoord_T) -1) { /* No markers were stored in spliceable */
9490 segmentm_left = segmentm->diagonal - querylength;
9491
9492 shortexon_orig_plusp = shortexon_orig_minusp = false;
9493 saw_acceptor_p = saw_antidonor_p = false;
9494
9495 segmentm_donor_nknown = 0;
9496 segmentm_acceptor_nknown = 0;
9497 segmentm_antidonor_nknown = 0;
9498 segmentm_antiacceptor_nknown = 0;
9499
9500 if ((joffset = segmentm->splicesites_i) >= 0) {
9501 j = joffset;
9502 while (j < nsplicesites && splicesites[j] < segmentm->diagonal) {
9503 if (splicetypes[j] == DONOR) {
9504 debug4k(printf("Setting known donor %d for segmentm at %llu\n",j,(unsigned long long) splicesites[j]));
9505 segmentm_donor_knownpos[segmentm_donor_nknown] = splicesites[j] - segmentm_left;
9506 segmentm_donor_knowni[segmentm_donor_nknown++] = j;
9507 if (saw_acceptor_p == true) {
9508 /* acceptor...donor */
9509 shortexon_orig_plusp = true;
9510 }
9511 } else if (splicetypes[j] == ANTIACCEPTOR) {
9512 debug4k(printf("Setting known antiacceptor %d for segmentm at %llu\n",j,(unsigned long long) splicesites[j]));
9513 segmentm_antiacceptor_knownpos[segmentm_antiacceptor_nknown] = splicesites[j] - segmentm_left;
9514 segmentm_antiacceptor_knowni[segmentm_antiacceptor_nknown++] = j;
9515 if (saw_antidonor_p == true) {
9516 /* antidonor...antiacceptor */
9517 shortexon_orig_minusp = true;
9518 }
9519 } else if (splicetypes[j] == ACCEPTOR) {
9520 debug4k(printf("Saw known acceptor at %llu\n",(unsigned long long) splicesites[j]));
9521 segmentm_acceptor_knownpos[segmentm_acceptor_nknown] = splicesites[j] - segmentm_left;
9522 segmentm_acceptor_knowni[segmentm_acceptor_nknown++] = j;
9523 saw_acceptor_p = true;
9524 } else if (splicetypes[j] == ANTIDONOR) {
9525 debug4k(printf("Saw known antidonor at %llu\n",(unsigned long long) splicesites[j]));
9526 segmentm_antidonor_knownpos[segmentm_antidonor_nknown] = splicesites[j] - segmentm_left;
9527 segmentm_antidonor_knowni[segmentm_antidonor_nknown++] = j;
9528 saw_antidonor_p = true;
9529 }
9530 j++;
9531 }
9532 }
9533
9534 /* Novel splicing. Do not alter j. */
9535 /* Still necessary to check segmentm querypos to achieve speed */
9536 if (novelsplicingp &&
9537 segmentm->querypos3 >= index1part && segmentm->querypos5 <= query_lastpos - index1part &&
9538 segmentm->left_splice_p == true && segmentm->right_splice_p == true) {
9539 debug4d(printf("segment diagonal %llu, querypos %d..%d\n",
9540 (unsigned long long) segmentm->diagonal,segmentm->querypos5,segmentm->querypos3));
9541
9542 spliceends = (List_T) NULL;
9543
9544 /* Identify potential segmenti for segmentm */
9545 segmenti_start = segmentm-1;
9546 while (
9547 /* Cannot use marker segments going leftward */
9548 segmenti_start >= &(segments[0]) &&
9549 segmenti_start->diagonal < (Univcoord_T) -1 && /* Needs to be next criterion, since we initialize only segments[0]->diagonal */
9550 segmenti_start->chrnum == segmentm->chrnum &&
9551 segmentm->diagonal <= segmenti_start->diagonal + max_distance) {
9552 segmenti_start--;
9553 }
9554
9555 /* Identify potential segmentj for segmentm */
9556 segmentj_end = segmentm+1;
9557 while (
9558 #ifdef NO_MARKER_SEGMENTS
9559 segmentj_end < &(segments[nsegments]) && segmentj_end->chrnum == segmentm->chrnum &&
9560 #endif
9561 segmentj_end->diagonal <= segmentm->diagonal + max_distance) {
9562 segmentj_end++;
9563 }
9564
9565 potentiali = (List_T) NULL;
9566 potentialj = (List_T) NULL;
9567 npotential_left = 0;
9568 npotential_right = 0;
9569 if ((segmentm - segmenti_start) * (segmentj_end - segmentm) >= MAX_LOCALSPLICING_POTENTIAL) {
9570 /* Too many to check */
9571 /* segmenti_start = segmentm-1 - MAX_LOCALSPLICING_POTENTIAL; */
9572 /* segmentj_end = segmentm+1 + MAX_LOCALSPLICING_POTENTIAL; */
9573 segmenti = segmenti_start; /* Don't process any */
9574 segmentj = segmentj_end; /* Don't process any */
9575 } else {
9576 segmenti = segmentm-1;
9577 segmentj = segmentm+1;
9578 }
9579
9580 for ( ; segmenti > segmenti_start; segmenti--) {
9581 debug4d(printf("local left? diagonal %llu, querypos %d..%d => diagonal %llu, querypos %d..%d\n",
9582 (unsigned long long) segmenti->diagonal,segmenti->querypos5,segmenti->querypos3,
9583 (unsigned long long) segmentm->diagonal,segmentm->querypos5,segmentm->querypos3));
9584 /* i5 i3 m5 m3 */
9585 assert(segmenti->diagonal < segmentm->diagonal);
9586 if (segmenti->leftmost < 0) {
9587 /* Failed outer floor test in find_singlesplices */
9588 } else if (plusp == true && segmenti->querypos3 >= segmentm->querypos5) {
9589 debug4d(printf("Bad querypos\n"));
9590 } else if (plusp == false && segmentm->querypos3 >= segmenti->querypos5) {
9591 debug4d(printf("Bad querypos\n"));
9592 } else if (segmenti->diagonal + min_intronlength > segmentm->diagonal) {
9593 debug4d(printf("Too short\n"));
9594 } else {
9595 potentiali = List_push(potentiali,(void *) segmenti);
9596 npotential_left++;
9597 debug4d(printf("Potential left #%d: %llu\n",npotential_left,(unsigned long long) segmenti->diagonal));
9598 }
9599 }
9600
9601 for ( ; segmentj < segmentj_end; segmentj++) {
9602 debug4d(printf("local right? diagonal %llu, querypos %d..%d => diagonal %llu, querypos %d..%d\n",
9603 (unsigned long long) segmentm->diagonal,segmentm->querypos5,segmentm->querypos3,
9604 (unsigned long long) segmentj->diagonal,segmentj->querypos5,segmentj->querypos3));
9605 /* m5 m3 j5 j3 */
9606 assert(segmentm->diagonal < segmentj->diagonal);
9607 if (segmentj->rightmost < 0) {
9608 /* Failed outer floor test in find_singlesplices */
9609 } else if (plusp == true && segmentm->querypos3 >= segmentj->querypos5) {
9610 debug4d(printf("Bad querypos\n"));
9611 } else if (plusp == false && segmentj->querypos3 >= segmentm->querypos5) {
9612 debug4d(printf("Bad querypos\n"));
9613 } else if (segmentm->diagonal + min_intronlength > segmentj->diagonal) {
9614 debug4d(printf("Too short\n"));
9615 } else {
9616 potentialj = List_push(potentialj,(void *) segmentj);
9617 npotential_right++;
9618 debug4d(printf("Potential right #%d: %llu\n",npotential_right,(unsigned long long) segmentj->diagonal));
9619 }
9620 }
9621
9622 if (npotential_left > 0 && npotential_right > 0) {
9623 segmentm_donor_knownpos[segmentm_donor_nknown] = querylength;
9624 segmentm_acceptor_knownpos[segmentm_acceptor_nknown] = querylength;
9625 segmentm_antidonor_knownpos[segmentm_antidonor_nknown] = querylength;
9626 segmentm_antiacceptor_knownpos[segmentm_antiacceptor_nknown] = querylength;
9627
9628 for (q = potentiali; q != NULL; q = List_next(q)) {
9629 segmenti = (Segment_T) List_head(q);
9630 segmenti_left = segmenti->diagonal - querylength;
9631
9632 /* Set known sites for segmenti */
9633 segmenti_donor_nknown = 0;
9634 segmenti_antiacceptor_nknown = 0;
9635 if ((jj = segmenti->splicesites_i) >= 0) {
9636 while (jj < nsplicesites && splicesites[jj] < segmenti->diagonal) {
9637 if (splicetypes[jj] == DONOR) {
9638 debug4d(printf("Setting known donor %d for segmenti at %llu\n",jj,(unsigned long long) splicesites[jj]));
9639 segmenti_donor_knownpos[segmenti_donor_nknown] = splicesites[jj] - segmenti_left;
9640 segmenti_donor_knowni[segmenti_donor_nknown++] = jj;
9641 } else if (splicetypes[jj] == ANTIACCEPTOR) {
9642 debug4d(printf("Setting known antiacceptor %d for segmenti at %llu\n",jj,(unsigned long long) splicesites[jj]));
9643 segmenti_antiacceptor_knownpos[segmenti_antiacceptor_nknown] = splicesites[jj] - segmenti_left;
9644 segmenti_antiacceptor_knowni[segmenti_antiacceptor_nknown++] = jj;
9645 }
9646 jj++;
9647 }
9648 }
9649 segmenti_donor_knownpos[segmenti_donor_nknown] = querylength;
9650 segmenti_antiacceptor_knownpos[segmenti_antiacceptor_nknown] = querylength;
9651
9652
9653 for (r = potentialj; r != NULL; r = List_next(r)) {
9654 segmentj = (Segment_T) List_head(r);
9655
9656 debug4d(printf("Doublesplice span test (%d mismatches allowed): %d mismatches found from leftmost %d to j.rightmost %d\n",
9657 max_mismatches_allowed,
9658 Genome_count_mismatches_substring(query_compress,segmentm_left,
9659 /*pos5*/segmenti->leftmost,/*pos3*/segmentj->rightmost,
9660 plusp,genestrand,first_read_p),
9661 segmenti->leftmost,segmentj->rightmost));
9662
9663 if (segmenti->leftmost >= segmentj->rightmost) {
9664 debug4d(printf("Double splice is not possible with pos5 %d > pos3 %d\n",
9665 segmenti->leftmost,segmentj->rightmost));
9666 } else if (Genome_count_mismatches_limit(query_compress,segmentm_left,
9667 /*pos5*/segmenti->leftmost,/*pos3*/segmentj->rightmost,
9668 max_mismatches_allowed,plusp,genestrand) <= max_mismatches_allowed) {
9669 debug4d(printf("Double splice is possible\n"));
9670 segmentj_left = segmentj->diagonal - querylength;
9671
9672 /* Set known sites for segmentj */
9673 segmentj_acceptor_nknown = 0;
9674 segmentj_antidonor_nknown = 0;
9675 if ((jj = segmentj->splicesites_i) >= 0) {
9676 while (jj < nsplicesites && splicesites[jj] < segmentj->diagonal) {
9677 if (splicetypes[jj] == ACCEPTOR) {
9678 debug4d(printf("Setting known acceptor %d for segmentj at %llu\n",jj,(unsigned long long) splicesites[jj]));
9679 segmentj_acceptor_knownpos[segmentj_acceptor_nknown] = splicesites[jj] - segmentj_left;
9680 segmentj_acceptor_knowni[segmentj_acceptor_nknown++] = jj;
9681 } else if (splicetypes[jj] == ANTIDONOR) {
9682 debug4d(printf("Setting known antidonor %d for segmentj at %llu\n",jj,(unsigned long long) splicesites[jj]));
9683 segmentj_antidonor_knownpos[segmentj_antidonor_nknown] = splicesites[jj] - segmentj_left;
9684 segmentj_antidonor_knowni[segmentj_antidonor_nknown++] = jj;
9685 }
9686 jj++;
9687 }
9688 }
9689 segmentj_acceptor_knownpos[segmentj_acceptor_nknown] = querylength;
9690 segmentj_antidonor_knownpos[segmentj_antidonor_nknown] = querylength;
9691
9692 debug4d(printf(" => checking for double splice: Splice_solve_double\n"));
9693 spliceends = Splice_solve_double(&(*found_score),&nhits_local,spliceends,&(*lowprob),
9694 &segmenti->usedp,&segmentm->usedp,&segmentj->usedp,
9695 /*segmenti_left*/segmenti->diagonal - querylength,
9696 /*segmentm_left*/segmentm->diagonal - querylength,
9697 /*segmentj_left*/segmentj->diagonal - querylength,
9698 segmenti->chrnum,segmenti->chroffset,segmenti->chrhigh,segmenti->chrlength,
9699 segmentm->chrnum,segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength,
9700 segmentj->chrnum,segmentj->chroffset,segmentj->chrhigh,segmentj->chrlength,
9701 querylength,query_compress,
9702 segmenti_donor_knownpos,segmentm_acceptor_knownpos,segmentm_donor_knownpos,segmentj_acceptor_knownpos,
9703 segmentj_antidonor_knownpos,segmentm_antiacceptor_knownpos,segmentm_antidonor_knownpos,segmenti_antiacceptor_knownpos,
9704 segmenti_donor_knowni,segmentm_acceptor_knowni,segmentm_donor_knowni,segmentj_acceptor_knowni,
9705 segmentj_antidonor_knowni,segmentm_antiacceptor_knowni,segmentm_antidonor_knowni,segmenti_antiacceptor_knowni,
9706 segmenti_donor_nknown,segmentm_acceptor_nknown,segmentm_donor_nknown,segmentj_acceptor_nknown,
9707 segmentj_antidonor_nknown,segmentm_antiacceptor_nknown,segmentm_antidonor_nknown,segmenti_antiacceptor_nknown,
9708 splicing_penalty,max_mismatches_allowed,plusp,genestrand,
9709 subs_or_indels_p,/*sarrayp*/false);
9710 }
9711 }
9712 }
9713 }
9714
9715 List_free(&potentialj);
9716 List_free(&potentiali);
9717
9718 /* Process results for segmentm. */
9719 if (spliceends != NULL) {
9720 best_nmismatches = querylength;
9721 best_prob = 0.0;
9722 for (p = spliceends; p != NULL; p = List_next(p)) {
9723 hit = (Stage3end_T) List_head(p);
9724 debug7(printf("analyzing distance %d, nmismatches %d, probability %f\n",
9725 Stage3end_distance(hit),Stage3end_nmismatches_whole(hit),
9726 Stage3end_shortexon_prob(hit)));
9727 if ((nmismatches = Stage3end_nmismatches_whole(hit)) < best_nmismatches) {
9728 best_nmismatches = nmismatches;
9729 }
9730 if ((prob = Stage3end_shortexon_prob(hit)) > best_prob) {
9731 best_prob = prob;
9732 }
9733 }
9734
9735 n_good_spliceends = 0;
9736 accepted_hits = rejected_hits = (List_T) NULL;
9737 for (p = spliceends; p != NULL; p = List_next(p)) {
9738 hit = (Stage3end_T) List_head(p);
9739 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP &&
9740 (Stage3end_shortexon_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP)) {
9741 debug7(printf("accepting distance %d, nmismatches %d, probability %f\n",
9742 Stage3end_distance(hit),Stage3end_nmismatches_whole(hit),
9743 Stage3end_shortexon_prob(hit)));
9744 n_good_spliceends += 1;
9745 accepted_hits = List_push(accepted_hits,(void *) hit);
9746 } else {
9747 rejected_hits = List_push(rejected_hits,(void *) hit);
9748 }
9749 }
9750
9751 if (n_good_spliceends == 0) {
9752 /* Conjunction is too strict. Allow for disjunction instead. */
9753 List_free(&rejected_hits);
9754 for (p = spliceends; p != NULL; p = List_next(p)) {
9755 hit = (Stage3end_T) List_head(p);
9756 if (Stage3end_nmismatches_whole(hit) <= best_nmismatches + LOCALSPLICING_NMATCHES_SLOP ||
9757 (Stage3end_shortexon_prob(hit) >= best_prob - LOCALSPLICING_PROB_SLOP)) {
9758 debug7(printf("accepting distance %d, nmismatches %d, probability %f\n",
9759 Stage3end_distance(hit),Stage3end_nmismatches_whole(hit),
9760 Stage3end_shortexon_prob(hit)));
9761 n_good_spliceends += 1;
9762 accepted_hits = List_push(accepted_hits,(void *) hit);
9763 } else {
9764 rejected_hits = List_push(rejected_hits,(void *) hit);
9765 }
9766 }
9767 }
9768
9769 for (p = rejected_hits; p != NULL; p = List_next(p)) {
9770 hit = (Stage3end_T) List_head(p);
9771 Stage3end_free(&hit);
9772 }
9773 List_free(&rejected_hits);
9774 List_free(&spliceends);
9775
9776 if (n_good_spliceends == 1) {
9777 hits = List_push(hits,List_head(accepted_hits));
9778 List_free(&accepted_hits);
9779
9780 } else {
9781 /* 5. Multiple hits, shortexon */
9782 debug7(printf("multiple splice hits, shortexon\n"));
9783
9784 /* Process multiple double ambiguous first */
9785 hitarray = (Stage3end_T *) List_to_array_n(&n,accepted_hits);
9786 qsort(hitarray,n,sizeof(Stage3end_T),substringD_match_length_cmp);
9787 List_free(&accepted_hits);
9788 single_ambig_hits = (List_T) NULL;
9789
9790 i = 0;
9791 while (i < n) {
9792 hit = hitarray[i];
9793 donor = Stage3end_substringD(hit);
9794 donor_length = Substring_match_length_orig(donor);
9795 acceptor = Stage3end_substringA(hit);
9796 acceptor_length = Substring_match_length_orig(acceptor);
9797 j = i + 1;
9798 while (j < n && Substring_match_length_orig(Stage3end_substringD(hitarray[j])) == donor_length &&
9799 Substring_match_length_orig(Stage3end_substringA(hitarray[j])) == acceptor_length) {
9800 j++;
9801 }
9802 if (j == i + 1) {
9803 /* Save for later analysis */
9804 single_ambig_hits = List_push(single_ambig_hits,(void *) hit);
9805 } else {
9806 donor_ambcoords = acceptor_ambcoords = NULL;
9807 donor_amb_knowni = acceptor_amb_knowni = (Intlist_T) NULL;
9808 donor_amb_nmismatches = acceptor_amb_nmismatches = (Intlist_T) NULL;
9809 donor_amb_probs = acceptor_amb_probs = (Doublelist_T) NULL;
9810
9811 qsort(&(hitarray[i]),j-i,sizeof(Stage3end_T),Stage3end_shortexon_substringD_cmp);
9812 donor = Stage3end_substringD(hitarray[i]);
9813 #ifdef LARGE_GENOMES
9814 donor_ambcoords = Uint8list_push(donor_ambcoords,Substring_splicecoord(donor));
9815 #else
9816 donor_ambcoords = Uintlist_push(donor_ambcoords,Substring_splicecoord(donor));
9817 #endif
9818 donor_amb_knowni = Intlist_push(donor_amb_knowni,-1);
9819 donor_amb_nmismatches = Intlist_push(donor_amb_nmismatches,Substring_nmismatches_whole(donor));
9820 donor_amb_probs = Doublelist_push(donor_amb_probs,Substring_chimera_prob(donor));
9821
9822 lastpos = Substring_left_genomicseg(donor);
9823 for (k = i + 1; k < j; k++) {
9824 donor = Stage3end_substringD(hitarray[k]);
9825 if (Substring_left_genomicseg(donor) != lastpos) {
9826 #ifdef LARGE_GENOMES
9827 donor_ambcoords = Uint8list_push(donor_ambcoords,Substring_splicecoord(donor));
9828 #else
9829 donor_ambcoords = Uintlist_push(donor_ambcoords,Substring_splicecoord(donor));
9830 #endif
9831 donor_amb_knowni = Intlist_push(donor_amb_knowni,-1);
9832 donor_amb_nmismatches = Intlist_push(donor_amb_nmismatches,Substring_nmismatches_whole(donor));
9833 donor_amb_probs = Doublelist_push(donor_amb_probs,Substring_chimera_prob(donor));
9834 }
9835 }
9836
9837 qsort(&(hitarray[i]),j-i,sizeof(Stage3end_T),Stage3end_shortexon_substringA_cmp);
9838 acceptor = Stage3end_substringA(hitarray[i]);
9839 #ifdef LARGE_GENOMES
9840 acceptor_ambcoords = Uint8list_push(acceptor_ambcoords,Substring_splicecoord(acceptor));
9841 #else
9842 acceptor_ambcoords = Uintlist_push(acceptor_ambcoords,Substring_splicecoord(acceptor));
9843 #endif
9844 acceptor_amb_knowni = Intlist_push(acceptor_amb_knowni,-1);
9845 acceptor_amb_nmismatches = Intlist_push(acceptor_amb_nmismatches,Substring_nmismatches_whole(acceptor));
9846 acceptor_amb_probs = Doublelist_push(acceptor_amb_probs,Substring_chimera_prob(acceptor));
9847
9848 lastpos = Substring_left_genomicseg(acceptor);
9849 for (k = i + 1; k < j; k++) {
9850 acceptor = Stage3end_substringA(hitarray[k]);
9851 if (Substring_left_genomicseg(acceptor) != lastpos) {
9852 #ifdef LARGE_GENOMES
9853 acceptor_ambcoords = Uint8list_push(acceptor_ambcoords,Substring_splicecoord(acceptor));
9854 #else
9855 acceptor_ambcoords = Uintlist_push(acceptor_ambcoords,Substring_splicecoord(acceptor));
9856 #endif
9857 acceptor_amb_knowni = Intlist_push(acceptor_amb_knowni,-1);
9858 acceptor_amb_nmismatches = Intlist_push(acceptor_amb_nmismatches,Substring_nmismatches_whole(acceptor));
9859 acceptor_amb_probs = Doublelist_push(acceptor_amb_probs,Substring_chimera_prob(acceptor));
9860 }
9861 }
9862
9863 shortexon = Stage3end_substringS(hitarray[i]);
9864 sensedir = Stage3end_sensedir(hitarray[i]);
9865 if (Intlist_length(donor_amb_nmismatches) > 1 && Intlist_length(acceptor_amb_nmismatches) > 1) {
9866 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),/*donor*/NULL,/*acceptor*/NULL,shortexon,
9867 /*donor_prob*/Doublelist_max(donor_amb_probs),Substring_siteA_prob(shortexon),
9868 Substring_siteD_prob(shortexon),/*acceptor_prob*/Doublelist_max(acceptor_amb_probs),
9869 /*amb_length_donor*/donor_length,/*amb_length_acceptor*/acceptor_length,
9870 donor_ambcoords,acceptor_ambcoords,
9871 donor_amb_knowni,acceptor_amb_knowni,
9872 donor_amb_nmismatches,acceptor_amb_nmismatches,
9873 donor_amb_probs,acceptor_amb_probs,
9874 /*copy_donor_p*/false,/*copy_acceptor_p*/false,/*copy_shortexon_p*/true,
9875 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
9876
9877 } else if (Intlist_length(donor_amb_nmismatches) > 1) {
9878 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),/*donor*/NULL,acceptor,shortexon,
9879 /*donor_prob*/Doublelist_max(donor_amb_probs),Substring_siteA_prob(shortexon),
9880 Substring_siteD_prob(shortexon),/*acceptor_prob*/Substring_chimera_prob(acceptor),
9881 /*amb_length_donor*/donor_length,/*amb_length_acceptor*/0,
9882 donor_ambcoords,/*acceptor_ambcoords*/NULL,
9883 donor_amb_knowni,/*amb_knowni_acceptor*/NULL,
9884 donor_amb_nmismatches,/*amb_nmismatches_acceptor*/NULL,
9885 donor_amb_probs,/*amb_probs_acceptor*/NULL,
9886 /*copy_donor_p*/false,/*copy_acceptor_p*/true,/*copy_shortexon_p*/true,
9887 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
9888
9889 } else if (Intlist_length(acceptor_amb_nmismatches) > 1) {
9890 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),donor,/*acceptor*/NULL,shortexon,
9891 /*donor_prob*/Substring_chimera_prob(donor),Substring_siteA_prob(shortexon),
9892 Substring_siteD_prob(shortexon),/*acceptor_prob*/Doublelist_max(acceptor_amb_probs),
9893 /*amb_length_donor*/0,/*amb_length_acceptor*/acceptor_length,
9894 /*ambcoords_donor*/NULL,acceptor_ambcoords,
9895 /*amb_knowni_donor*/NULL,acceptor_amb_knowni,
9896 /*amb_nmismatches_donor*/NULL,acceptor_amb_nmismatches,
9897 /*amb_probs_donor*/NULL,acceptor_amb_probs,
9898 /*copy_donor_p*/true,/*copy_acceptor_p*/false,/*copy_shortexon_p*/true,
9899 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
9900
9901 } else {
9902 /* A singleton, apparently due to many duplicates. Is this possible? */
9903 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),donor,acceptor,shortexon,
9904 /*donor_prob*/Substring_chimera_prob(donor),Substring_siteA_prob(shortexon),
9905 Substring_siteD_prob(shortexon),/*acceptor_prob*/Substring_chimera_prob(acceptor),
9906 /*amb_length_donor*/0,/*amb_length_acceptor*/0,
9907 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
9908 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
9909 /*amb_nmismatches_donor*/NULL,/*amb_nmismatches_acceptor*/NULL,
9910 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
9911 /*copy_donor_p*/true,/*copy_acceptor_p*/true,/*copy_shortexon_p*/true,
9912 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
9913
9914 }
9915
9916 Doublelist_free(&donor_amb_probs);
9917 Intlist_free(&donor_amb_nmismatches);
9918 Intlist_free(&donor_amb_knowni);
9919 Doublelist_free(&acceptor_amb_probs);
9920 Intlist_free(&acceptor_amb_nmismatches);
9921 Intlist_free(&acceptor_amb_knowni);
9922 #ifdef LARGE_GENOMES
9923 Uint8list_free(&donor_ambcoords);
9924 Uint8list_free(&acceptor_ambcoords);
9925 #else
9926 Uintlist_free(&donor_ambcoords);
9927 Uintlist_free(&acceptor_ambcoords);
9928 #endif
9929 for (k = i; k < j; k++) {
9930 hit = hitarray[k];
9931 Stage3end_free(&hit);
9932 }
9933 }
9934
9935 i = j;
9936 }
9937 FREE(hitarray);
9938
9939 /* Process single ambiguous on donor side */
9940 hitarray = (Stage3end_T *) List_to_array_n(&n,single_ambig_hits);
9941 qsort(hitarray,n,sizeof(Stage3end_T),substringD_match_length_cmp);
9942 List_free(&single_ambig_hits);
9943 single_ambig_hits = (List_T) NULL;
9944
9945 i = 0;
9946 while (i < n) {
9947 hit = hitarray[i];
9948 donor = Stage3end_substringD(hit);
9949 donor_length = Substring_match_length_orig(donor);
9950 j = i + 1;
9951 while (j < n && Substring_match_length_orig(Stage3end_substringD(hitarray[j])) == donor_length) {
9952 j++;
9953 }
9954 if (j == i + 1) {
9955 /* Save for later analysis */
9956 single_ambig_hits = List_push(single_ambig_hits,(void *) hit);
9957 } else {
9958 acceptor_ambcoords = NULL;
9959 acceptor_amb_knowni = (Intlist_T) NULL;
9960 acceptor_amb_nmismatches = (Intlist_T) NULL;
9961 acceptor_amb_probs = (Doublelist_T) NULL;
9962
9963 for (k = i + 1; k < j; k++) {
9964 acceptor = Stage3end_substringA(hitarray[i]);
9965 #ifdef LARGE_GENOMES
9966 acceptor_ambcoords = Uint8list_push(acceptor_ambcoords,Substring_splicecoord(acceptor));
9967 #else
9968 acceptor_ambcoords = Uintlist_push(acceptor_ambcoords,Substring_splicecoord(acceptor));
9969 #endif
9970 acceptor_amb_knowni = Intlist_push(acceptor_amb_knowni,-1);
9971 acceptor_amb_nmismatches = Intlist_push(acceptor_amb_nmismatches,Substring_nmismatches_whole(acceptor));
9972 acceptor_amb_probs = Doublelist_push(acceptor_amb_probs,Substring_chimera_prob(acceptor));
9973 }
9974
9975 shortexon = Stage3end_substringS(hitarray[i]);
9976 sensedir = Stage3end_sensedir(hitarray[i]);
9977 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),donor,/*acceptor*/NULL,shortexon,
9978 /*donor_prob*/Substring_chimera_prob(donor),Substring_siteA_prob(shortexon),
9979 Substring_siteD_prob(shortexon),/*acceptor_prob*/Doublelist_max(acceptor_amb_probs),
9980 /*amb_length_donor*/0,/*amb_length_acceptor*/Substring_match_length_orig(acceptor),
9981 /*ambcoords_donor*/NULL,acceptor_ambcoords,
9982 /*amb_knowni_donor*/NULL,acceptor_amb_knowni,
9983 /*amb_nmismatches_donor*/NULL,acceptor_amb_nmismatches,
9984 /*amb_probs_donor*/NULL,acceptor_amb_probs,
9985 /*copy_donor_p*/true,/*copy_acceptor_p*/false,/*copy_shortexon_p*/true,
9986 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
9987 Doublelist_free(&acceptor_amb_probs);
9988 Intlist_free(&acceptor_amb_nmismatches);
9989 Intlist_free(&acceptor_amb_knowni);
9990 #ifdef LARGE_GENOMES
9991 Uint8list_free(&acceptor_ambcoords);
9992 #else
9993 Uintlist_free(&acceptor_ambcoords);
9994 #endif
9995 for (k = i; k < j; k++) {
9996 hit = hitarray[k];
9997 Stage3end_free(&hit);
9998 }
9999 }
10000
10001 i = j;
10002 }
10003 FREE(hitarray);
10004
10005 /* Process single ambiguous on acceptor side */
10006 hitarray = (Stage3end_T *) List_to_array_n(&n,single_ambig_hits);
10007 qsort(hitarray,n,sizeof(Stage3end_T),substringA_match_length_cmp);
10008 List_free(&single_ambig_hits);
10009
10010 i = 0;
10011 while (i < n) {
10012 hit = hitarray[i];
10013 acceptor = Stage3end_substringA(hit);
10014 acceptor_length = Substring_match_length_orig(acceptor);
10015 j = i + 1;
10016 while (j < n && Substring_match_length_orig(Stage3end_substringA(hitarray[j])) == acceptor_length) {
10017 j++;
10018 }
10019 if (j == i + 1) {
10020 /* Finally, a confirmed unique */
10021 hits = List_push(hits,(void *) hit);
10022 } else {
10023 donor_ambcoords = NULL;
10024 donor_amb_knowni = (Intlist_T) NULL;
10025 donor_amb_nmismatches = (Intlist_T) NULL;
10026 donor_amb_probs = (Doublelist_T) NULL;
10027
10028 for (k = i + 1; k < j; k++) {
10029 donor = Stage3end_substringD(hitarray[i]);
10030 #ifdef LARGE_GENOMES
10031 donor_ambcoords = Uint8list_push(donor_ambcoords,Substring_splicecoord(donor));
10032 #else
10033 donor_ambcoords = Uintlist_push(donor_ambcoords,Substring_splicecoord(donor));
10034 #endif
10035 donor_amb_knowni = Intlist_push(donor_amb_knowni,-1);
10036 donor_amb_nmismatches = Intlist_push(donor_amb_nmismatches,Substring_nmismatches_whole(donor));
10037 donor_amb_probs = Doublelist_push(donor_amb_probs,Substring_chimera_prob(donor));
10038 }
10039
10040 shortexon = Stage3end_substringS(hitarray[i]);
10041 sensedir = Stage3end_sensedir(hitarray[i]);
10042 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),/*donor*/NULL,acceptor,shortexon,
10043 /*donor_prob*/Doublelist_max(donor_amb_probs),Substring_siteA_prob(shortexon),
10044 Substring_siteD_prob(shortexon),/*acceptor_prob*/Substring_chimera_prob(acceptor),
10045 /*amb_length_donor*/Substring_match_length_orig(donor),/*amb_length_acceptor*/0,
10046 donor_ambcoords,/*acceptor_ambcoords*/NULL,
10047 donor_amb_knowni,/*amb_knowni_acceptor*/NULL,
10048 donor_amb_nmismatches,/*amb_nmismatches_acceptor*/NULL,
10049 donor_amb_probs,/*amb_probs_acceptor*/NULL,
10050 /*copy_donor_p*/false,/*copy_acceptor_p*/true,/*copy_shortexon_p*/true,
10051 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
10052 Doublelist_free(&donor_amb_probs);
10053 Intlist_free(&donor_amb_nmismatches);
10054 Intlist_free(&donor_amb_knowni);
10055 #ifdef LARGE_GENOMES
10056 Uint8list_free(&donor_ambcoords);
10057 #else
10058 Uintlist_free(&donor_ambcoords);
10059 #endif
10060 for (k = i; k < j; k++) {
10061 hit = hitarray[k];
10062 Stage3end_free(&hit);
10063 }
10064 }
10065
10066 i = j;
10067 }
10068 FREE(hitarray);
10069 }
10070 }
10071 }
10072
10073
10074 /* Short exon using known splicing, originally on plus strand */
10075 if (shortexon_orig_plusp == true) {
10076 debug4k(printf("Short exon candidate, orig_plusp. Saw short exon acceptor...donor on segment i\n"));
10077 sensedir = (plusp == true) ? SENSE_FORWARD : SENSE_ANTI;
10078 assert(plusp == true);
10079 assert(sensedir == SENSE_FORWARD);
10080
10081 for (j1 = joffset; j1 < j; j1++) {
10082 if (splicetypes[j1] == ACCEPTOR) {
10083 leftpos = splicesites[j1] - segmentm_left;
10084 debug4k(printf(" Doing Splicetrie_find_left from leftpos %d (plus)\n",leftpos));
10085 if ((splicesites_i_left =
10086 Splicetrie_find_left(&nmismatches_shortexon_left,&nmismatches_list_left,j1,
10087 /*origleft*/segmentm_left,/*pos5*/0,/*pos3*/leftpos,segmentm->chroffset,
10088 query_compress,queryptr,querylength,max_mismatches_allowed,/*plusp*/true,
10089 genestrand,first_read_p,
10090 /*collect_all_p*/pairedp == true && first_read_p != plusp)) != NULL) {
10091 ambp_left = (leftpos < min_shortend || Intlist_length(splicesites_i_left) > 1) ? true : false;
10092
10093 for (j2 = j1 + 1; j2 < j; j2++) {
10094 if (splicetypes[j2] == DONOR && splicesites[j2] > splicesites[j1]) {
10095 rightpos = splicesites[j2] - segmentm_left;
10096 debug4k(printf(" Doing Splicetrie_find_right from rightpos %d (plus)\n",rightpos));
10097 if ((nmismatches_shortexon_middle =
10098 Genome_count_mismatches_substring(query_compress,segmentm_left,/*pos5*/leftpos,/*pos3*/rightpos,
10099 plusp,genestrand)) <= max_mismatches_allowed - nmismatches_shortexon_left &&
10100 (splicesites_i_right =
10101 Splicetrie_find_right(&nmismatches_shortexon_right,&nmismatches_list_right,j2,
10102 /*origleft*/segmentm_left,/*pos5*/rightpos,/*pos3*/querylength,segmentm->chrhigh,
10103 query_compress,queryptr,
10104 max_mismatches_allowed - nmismatches_shortexon_left - nmismatches_shortexon_middle,
10105 /*plusp*/true,genestrand,first_read_p,
10106 /*collect_all_p*/pairedp == true && first_read_p == plusp)) != NULL) {
10107 ambp_right = (querylength - rightpos < min_shortend || Intlist_length(splicesites_i_right) > 1) ? true : false;
10108
10109 debug4k(printf(" donor %s ... acceptor %d (%llu) ... donor %d (%llu) ... acceptor %s: %d + %d + %d mismatches\n",
10110 Intlist_to_string(splicesites_i_left),j1,(unsigned long long) splicesites[j1],
10111 j2,(unsigned long long) splicesites[j2],Intlist_to_string(splicesites_i_right),
10112 nmismatches_shortexon_left,nmismatches_shortexon_middle,nmismatches_shortexon_right));
10113
10114 if (ambp_left == true && ambp_right == true) {
10115 shortexon = Substring_new_shortexon(/*acceptor_coord*/splicesites[j1],/*acceptor_knowni*/j1,
10116 /*donor_coord*/splicesites[j2],/*donor_knowni*/j2,
10117 /*acceptor_pos*/leftpos,/*donor_pos*/rightpos,
10118 nmismatches_shortexon_middle,
10119 /*acceptor_prob*/2.0,/*donor_prob*/2.0,
10120 /*left*/segmentm_left,query_compress,
10121 querylength,/*plusp*/true,genestrand,
10122 sensedir,/*acceptor_ambp*/true,/*donor_ambp*/true,
10123 segmentm->chrnum,segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10124 if (shortexon != NULL) {
10125 debug4k(printf("New one-third shortexon at left %llu\n",(unsigned long long) segmentm_left));
10126 ambcoords_donor = lookup_splicesites(&probs_donor,splicesites_i_left,splicesites);
10127 ambcoords_acceptor = lookup_splicesites(&probs_acceptor,splicesites_i_right,splicesites);
10128 amb_length_donor = leftpos /*- nmismatches_shortexon_left*/;
10129 amb_length_acceptor = querylength - rightpos /*- nmismatches_shortexon_right*/;
10130 segmentm->usedp = true;
10131 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),/*donor*/NULL,/*acceptor*/NULL,shortexon,
10132 Doublelist_max(probs_donor),Substring_siteA_prob(shortexon),
10133 Substring_siteD_prob(shortexon),Doublelist_max(probs_acceptor),
10134 amb_length_donor,amb_length_acceptor,
10135 ambcoords_donor,ambcoords_acceptor,
10136 /*amb_knowni_donor*/splicesites_i_left,/*amb_knowni_acceptor*/splicesites_i_right,
10137 /*amb_nmismatches_donor*/nmismatches_list_left,/*amb_nmismatches_acceptor*/nmismatches_list_right,
10138 /*amb_probs_donor*/probs_donor,/*amb_nmismatches_acceptor*/probs_acceptor,
10139 /*copy_donor_p*/false,/*copy_acceptor_p*/false,/*copy_shortexon_p*/false,
10140 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
10141 Doublelist_free(&probs_donor);
10142 Doublelist_free(&probs_acceptor);
10143 #ifdef LARGE_GENOMES
10144 Uint8list_free(&ambcoords_donor);
10145 Uint8list_free(&ambcoords_acceptor);
10146 #else
10147 Uintlist_free(&ambcoords_donor);
10148 Uintlist_free(&ambcoords_acceptor);
10149 #endif
10150 }
10151
10152 } else if (ambp_left == true && ambp_right == false) {
10153 debug4k(printf("ambp_left true, ambp_right false\n"));
10154 best_right_j = Intlist_head(splicesites_i_right);
10155
10156 debug4k(printf("shortexon with amb_acceptor at %d (%llu) ... donor at %d (%llu)\n",
10157 j1,(unsigned long long) splicesites[j1],j2,(unsigned long long) splicesites[j2]));
10158 shortexon = Substring_new_shortexon(/*acceptor_coord*/splicesites[j1],/*acceptor_knowni*/j1,
10159 /*donor_coord*/splicesites[j2],/*donor_knowni*/j2,
10160 /*acceptor_pos*/leftpos,/*donor_pos*/rightpos,
10161 nmismatches_shortexon_middle,
10162 /*acceptor_prob*/2.0,/*donor_prob*/2.0,
10163 /*left*/segmentm_left,query_compress,
10164 querylength,/*plusp*/true,genestrand,
10165 sensedir,/*acceptor_ambp*/true,/*donor_ambp*/false,
10166 segmentm->chrnum,segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10167
10168 debug4k(printf("acceptor at %d (%llu)\n",best_right_j,(unsigned long long) splicesites[best_right_j]));
10169 acceptor = Substring_new_acceptor(/*acceptor_coord*/splicesites[best_right_j],/*acceptor_knowni*/best_right_j,
10170 /*splice_pos*/rightpos,/*substring_querystart*/0,/*substring_queryend*/querylength,
10171 nmismatches_shortexon_right,
10172 /*prob*/2.0,/*left*/splicesites[best_right_j]-rightpos,
10173 query_compress,querylength,/*plusp*/true,genestrand,
10174 /*sensedir*/SENSE_FORWARD,segmentm->chrnum,
10175 segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10176
10177 if (shortexon == NULL || acceptor == NULL) {
10178 if (shortexon != NULL) Substring_free(&shortexon);
10179 if (acceptor != NULL) Substring_free(&acceptor);
10180 } else {
10181 debug4k(printf("ambp_left true, ambp_right false: New two-thirds shortexon at left %llu\n",
10182 (unsigned long long) segmentm_left));
10183 ambcoords_donor = lookup_splicesites(&probs_donor,splicesites_i_left,splicesites);
10184 amb_length_donor = leftpos /*- nmismatches_shortexon_left*/;
10185 segmentm->usedp = true;
10186 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),/*donor*/NULL,acceptor,shortexon,
10187 Doublelist_max(probs_donor),Substring_siteA_prob(shortexon),
10188 Substring_siteD_prob(shortexon),Substring_chimera_prob(acceptor),
10189 amb_length_donor,/*amb_length_acceptor*/0,
10190 ambcoords_donor,/*ambcoords_acceptor*/NULL,
10191 /*amb_knowni_donor*/splicesites_i_left,/*amb_knowni_acceptor*/NULL,
10192 /*amb_nmismatches_donor*/nmismatches_list_left,/*amb_nmismatches_acceptor*/NULL,
10193 /*amb_probs_donor*/probs_donor,/*amb_probs_acceptor*/NULL,
10194 /*copy_donor_p*/false,/*copy_acceptor_p*/false,/*copy_shortexon_p*/false,
10195 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
10196 Doublelist_free(&probs_donor);
10197 #ifdef LARGE_GENOMES
10198 Uint8list_free(&ambcoords_donor);
10199 #else
10200 Uintlist_free(&ambcoords_donor);
10201 #endif
10202 }
10203
10204 } else if (ambp_left == false && ambp_right == true) {
10205 debug4k(printf("ambp_left false, ambp_right true\n"));
10206 best_left_j = Intlist_head(splicesites_i_left);
10207
10208 debug4k(printf("donor at %d (%llu)\n",best_left_j,(unsigned long long) splicesites[best_left_j]));
10209 donor = Substring_new_donor(/*donor_coord*/splicesites[best_left_j],/*donor_knowni*/best_left_j,
10210 /*splice_pos*/leftpos,/*substring_querystart*/0,/*substring_queryend*/querylength,
10211 nmismatches_shortexon_left,
10212 /*prob*/2.0,/*left*/splicesites[best_left_j]-leftpos,
10213 query_compress,querylength,/*plusp*/true,genestrand,
10214 /*sensedir*/SENSE_FORWARD,segmentm->chrnum,
10215 segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10216
10217 debug4k(printf("shortexon with acceptor at %d (%llu) ... amb_donor %d (%llu)\n",
10218 j1,(unsigned long long) splicesites[j1],j2,(unsigned long long) splicesites[j2]));
10219 shortexon = Substring_new_shortexon(/*acceptor_coord*/splicesites[j1],/*acceptor_knowni*/j1,
10220 /*donor_coord*/splicesites[j2],/*donor_knowni*/j2,
10221 /*acceptor_pos*/leftpos,/*donor_pos*/rightpos,
10222 nmismatches_shortexon_middle,
10223 /*acceptor_prob*/2.0,/*donor_prob*/2.0,
10224 /*left*/segmentm_left,query_compress,
10225 querylength,/*plusp*/true,genestrand,
10226 /*sensedir*/SENSE_FORWARD,/*acceptor_ambp*/false,/*donor_ambp*/true,
10227 segmentm->chrnum,segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10228
10229 if (donor == NULL || shortexon == NULL) {
10230 if (donor != NULL) Substring_free(&donor);
10231 if (shortexon != NULL) Substring_free(&shortexon);
10232 } else {
10233 ambcoords_acceptor = lookup_splicesites(&probs_acceptor,splicesites_i_right,splicesites);
10234 amb_length_acceptor = querylength - rightpos /*- nmismatches_shortexon_right*/;
10235 segmentm->usedp = true;
10236 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),donor,/*acceptor*/NULL,shortexon,
10237 Substring_chimera_prob(donor),Substring_siteA_prob(shortexon),
10238 Substring_siteD_prob(shortexon),Doublelist_max(probs_acceptor),
10239 /*amb_length_donor*/0,amb_length_acceptor,
10240 /*ambcoords_donor*/NULL,ambcoords_acceptor,
10241 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/splicesites_i_right,
10242 /*amb_nmismatches_donor*/NULL,/*amb_nmismatches_acceptor*/nmismatches_list_right,
10243 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/probs_acceptor,
10244 /*copy_donor_p*/false,/*copy_acceptor_p*/false,/*copy_shortexon_p*/false,
10245 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
10246 Doublelist_free(&probs_acceptor);
10247 #ifdef LARGE_GENOMES
10248 Uint8list_free(&ambcoords_acceptor);
10249 #else
10250 Uintlist_free(&ambcoords_acceptor);
10251 #endif
10252 }
10253
10254
10255 } else { /* ambp_left == false && ambp_right == false */
10256 debug4k(printf("ambp_left false, ambp_right false\n"));
10257 best_left_j = Intlist_head(splicesites_i_left);
10258 best_right_j = Intlist_head(splicesites_i_right);
10259 donor = Substring_new_donor(/*donor_coord*/splicesites[best_left_j],/*donor_knowni*/best_left_j,
10260 /*splice_pos*/leftpos,/*substring_querystart*/0,/*substring_queryend*/querylength,
10261 nmismatches_shortexon_left,
10262 /*prob*/2.0,/*left*/splicesites[best_left_j]-leftpos,
10263 query_compress,querylength,/*plusp*/true,genestrand,
10264 /*sensedir*/SENSE_FORWARD,segmentm->chrnum,
10265 segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10266
10267 shortexon = Substring_new_shortexon(/*acceptor_coord*/splicesites[j1],/*acceptor_knowni*/j1,
10268 /*donor_coord*/splicesites[j2],/*donor_knowni*/j2,
10269 /*acceptor_pos*/leftpos,/*donor_pos*/rightpos,
10270 nmismatches_shortexon_middle,/*acceptor_prob*/2.0,/*donor_prob*/2.0,
10271 /*left*/segmentm_left,query_compress,
10272 querylength,/*plusp*/true,genestrand,
10273 sensedir,/*acceptor_ambp*/false,/*donor_ambp*/false,
10274 segmentm->chrnum,segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10275
10276 acceptor = Substring_new_acceptor(/*acceptor_coord*/splicesites[best_right_j],/*acceptor_knowni*/best_right_j,
10277 /*splice_pos*/rightpos,/*substring_querystart*/0,/*substring_queryend*/querylength,
10278 nmismatches_shortexon_right,
10279 /*prob*/2.0,/*left*/splicesites[best_right_j]-rightpos,
10280 query_compress,querylength,/*plusp*/true,genestrand,
10281 /*sensedir*/SENSE_FORWARD,segmentm->chrnum,
10282 segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10283
10284 if (donor == NULL || shortexon == NULL || acceptor == NULL) {
10285 if (donor != NULL) Substring_free(&donor);
10286 if (shortexon != NULL) Substring_free(&shortexon);
10287 if (acceptor != NULL) Substring_free(&acceptor);
10288 } else {
10289 debug4k(printf("New shortexon at left %llu\n",(unsigned long long) segmentm_left));
10290 segmentm->usedp = true;
10291 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),donor,acceptor,shortexon,
10292 Substring_chimera_prob(donor),Substring_siteA_prob(shortexon),
10293 Substring_siteD_prob(shortexon),Substring_chimera_prob(acceptor),
10294 /*amb_length_donor*/0,/*amb_length_acceptor*/0,
10295 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
10296 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
10297 /*amb_nmismatches_donor*/NULL,/*amb_nmismatches_acceptor*/NULL,
10298 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
10299 /*copy_donor_p*/false,/*copy_acceptor_p*/false,/*copy_shortexon_p*/false,
10300 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
10301 }
10302 }
10303 Intlist_free(&nmismatches_list_right);
10304 Intlist_free(&splicesites_i_right);
10305 }
10306 }
10307 }
10308 Intlist_free(&nmismatches_list_left);
10309 Intlist_free(&splicesites_i_left);
10310 }
10311 }
10312 }
10313 debug4k(printf("End of case 1\n"));
10314 }
10315
10316 /* Short exon using known splicing, originally on minus strand */
10317 if (shortexon_orig_minusp == true) {
10318 debug4k(printf("Short exon candidate, orig_minusp. Saw short exon antidonor...antiacceptor on segment i\n"));
10319 sensedir = (plusp == true) ? SENSE_ANTI : SENSE_FORWARD;
10320 assert(plusp == false);
10321 assert(sensedir == SENSE_ANTI);
10322
10323 for (j1 = joffset; j1 < j; j1++) {
10324 if (splicetypes[j1] == ANTIDONOR) {
10325 leftpos = splicesites[j1] - segmentm_left;
10326 debug4k(printf(" Doing Splicetrie_find_left from leftpos %d (minus)\n",leftpos));
10327 if ((splicesites_i_left =
10328 Splicetrie_find_left(&nmismatches_shortexon_left,&nmismatches_list_left,j1,
10329 /*origleft*/segmentm_left,/*pos5*/0,/*pos3*/leftpos,segmentm->chroffset,
10330 query_compress,queryptr,querylength,max_mismatches_allowed,
10331 /*plusp*/false,genestrand,first_read_p,
10332 /*collect_all_p*/pairedp == true && first_read_p != plusp)) != NULL) {
10333 ambp_left = (leftpos < min_shortend || Intlist_length(splicesites_i_left) > 1) ? true : false;
10334
10335 for (j2 = j1 + 1; j2 < j; j2++) {
10336 if (splicetypes[j2] == ANTIACCEPTOR && splicesites[j2] > splicesites[j1]) {
10337 rightpos = splicesites[j2] - segmentm_left;
10338 debug4k(printf(" Doing Splicetrie_find_right from rightpos %d (minus)\n",rightpos));
10339 if ((nmismatches_shortexon_middle =
10340 Genome_count_mismatches_substring(query_compress,segmentm_left,/*pos5*/leftpos,/*pos3*/rightpos,
10341 /*plusp*/false,genestrand)) <= max_mismatches_allowed - nmismatches_shortexon_left &&
10342 (splicesites_i_right =
10343 Splicetrie_find_right(&nmismatches_shortexon_right,&nmismatches_list_right,j2,
10344 /*origleft*/segmentm_left,/*pos5*/rightpos,/*pos3*/querylength,segmentm->chrhigh,
10345 query_compress,queryptr,
10346 max_mismatches_allowed - nmismatches_shortexon_left - nmismatches_shortexon_middle,
10347 /*plusp*/false,genestrand,first_read_p,
10348 /*collect_all_p*/pairedp == true && first_read_p == plusp)) != NULL) {
10349 ambp_right = (querylength - rightpos < min_shortend || Intlist_length(splicesites_i_right) > 1) ? true : false;
10350
10351 debug4k(printf(" antiacceptor %s ... antidonor %d (%llu) ... antiacceptor %d (%llu) ... antidonor %s: %d + %d + %d mismatches\n",
10352 Intlist_to_string(splicesites_i_left),j1,(unsigned long long) splicesites[j1],
10353 j2,(unsigned long long) splicesites[j2],Intlist_to_string(splicesites_i_right),
10354 nmismatches_shortexon_left,nmismatches_shortexon_middle,nmismatches_shortexon_right));
10355
10356 if (ambp_left == true && ambp_right == true) {
10357 shortexon = Substring_new_shortexon(/*acceptor_coord*/splicesites[j2],/*acceptor_knowni*/j2,
10358 /*donor_coord*/splicesites[j1],/*donor_knowni*/j1,
10359 /*acceptor_pos*/rightpos,/*donor_pos*/leftpos,nmismatches_shortexon_middle,
10360 /*acceptor_prob*/2.0,/*donor_prob*/2.0,
10361 /*left*/segmentm_left,query_compress,
10362 querylength,/*plusp*/false,genestrand,
10363 sensedir,/*acceptor_ambp*/true,/*donor_ambp*/true,
10364 segmentm->chrnum,segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10365 if (shortexon != NULL) {
10366 debug4k(printf("New one-third shortexon at left %llu\n",(unsigned long long) segmentm_left));
10367 ambcoords_donor = lookup_splicesites(&probs_donor,splicesites_i_right,splicesites);
10368 ambcoords_acceptor = lookup_splicesites(&probs_acceptor,splicesites_i_left,splicesites);
10369 amb_length_donor = querylength - rightpos /*- nmismatches_shortexon_right*/;
10370 amb_length_acceptor = leftpos /*- nmismatches_shortexon_left*/;
10371 segmentm->usedp = true;
10372 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),/*donor*/NULL,/*acceptor*/NULL,shortexon,
10373 Doublelist_max(probs_donor),Substring_siteA_prob(shortexon),
10374 Substring_siteD_prob(shortexon),Doublelist_max(probs_acceptor),
10375 amb_length_donor,amb_length_acceptor,
10376 ambcoords_donor,ambcoords_acceptor,
10377 /*amb_knowni_donor*/splicesites_i_right,/*amb_knowni_acceptor*/splicesites_i_left,
10378 /*amb_nmismatches_donor*/nmismatches_list_right,/*amb_nmismatches_acceptor*/nmismatches_list_left,
10379 /*amb_probs_donor*/probs_donor,/*amb_probs_acceptor*/probs_acceptor,
10380 /*copy_donor_p*/false,/*copy_acceptor_p*/false,/*copy_shortexon_p*/false,
10381 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
10382 Doublelist_free(&probs_donor);
10383 Doublelist_free(&probs_acceptor);
10384 #ifdef LARGE_GENOMES
10385 Uint8list_free(&ambcoords_donor);
10386 Uint8list_free(&ambcoords_acceptor);
10387 #else
10388 Uintlist_free(&ambcoords_donor);
10389 Uintlist_free(&ambcoords_acceptor);
10390 #endif
10391 }
10392
10393 } else if (ambp_left == true && ambp_right == false) {
10394 debug4k(printf("ambp_left true, ambp_right false\n"));
10395 best_right_j = Intlist_head(splicesites_i_right);
10396
10397 debug4k(printf("shortexon with amb_donor at %d (%llu) ... acceptor at %d (%llu)\n",
10398 j1,(unsigned long long) splicesites[j1],j2,(unsigned long long) splicesites[j2]));
10399 shortexon = Substring_new_shortexon(/*acceptor_coord*/splicesites[j2],/*acceptor_knowni*/j2,
10400 /*donor_coord*/splicesites[j1],/*donor_knowni*/j1,
10401 /*acceptor_pos*/rightpos,/*donor_pos*/leftpos,nmismatches_shortexon_middle,
10402 /*acceptor_prob*/2.0,/*donor_prob*/2.0,
10403 /*left*/segmentm_left,query_compress,
10404 querylength,/*plusp*/false,genestrand,
10405 sensedir,/*acceptor_ambp*/false,/*donor_ambp*/true,
10406 segmentm->chrnum,segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10407
10408 debug4k(printf("donor at %d (%llu)\n",best_right_j,(unsigned long long) splicesites[best_right_j]));
10409 donor = Substring_new_donor(/*donor_coord*/splicesites[best_right_j],/*donor_knowni*/best_right_j,
10410 /*splice_pos*/rightpos,/*substring_querystart*/0,/*substring_queryend*/querylength,
10411 nmismatches_shortexon_right,
10412 /*prob*/2.0,/*left*/splicesites[best_right_j]-rightpos,
10413 query_compress,querylength,/*plusp*/false,genestrand,
10414 /*sensedir*/SENSE_ANTI,segmentm->chrnum,
10415 segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10416
10417 if (donor == NULL || shortexon == NULL) {
10418 if (donor != NULL) Substring_free(&donor);
10419 if (shortexon != NULL) Substring_free(&shortexon);
10420 } else {
10421 ambcoords_acceptor = lookup_splicesites(&probs_acceptor,splicesites_i_left,splicesites);
10422 amb_length_acceptor = leftpos /*- nmismatches_shortexon_left*/;
10423 segmentm->usedp = true;
10424 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),donor,/*acceptor*/NULL,shortexon,
10425 Substring_chimera_prob(donor),Substring_siteA_prob(shortexon),
10426 Substring_siteD_prob(shortexon),Doublelist_max(probs_acceptor),
10427 /*amb_length_donor*/0,amb_length_acceptor,
10428 /*ambcoords_donor*/NULL,ambcoords_acceptor,
10429 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/splicesites_i_left,
10430 /*amb_nmismatches_donor*/NULL,/*amb_nmismatches_acceptor*/nmismatches_list_left,
10431 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/probs_acceptor,
10432 /*copy_donor_p*/false,/*copy_acceptor_p*/false,/*copy_shortexon_p*/false,
10433 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
10434 Doublelist_free(&probs_acceptor);
10435 #ifdef LARGE_GENOMES
10436 Uint8list_free(&ambcoords_acceptor);
10437 #else
10438 Uintlist_free(&ambcoords_acceptor);
10439 #endif
10440 }
10441
10442 } else if (ambp_left == false && ambp_right == true) {
10443 debug4k(printf("ambp_left false, ambp_right true\n"));
10444 best_left_j = Intlist_head(splicesites_i_left);
10445
10446 debug4k(printf("acceptor at %d (%llu)\n",best_left_j,(unsigned long long) splicesites[best_left_j]));
10447 acceptor = Substring_new_acceptor(/*acceptor_coord*/splicesites[best_left_j],/*acceptor_knowni*/best_left_j,
10448 /*splice_pos*/leftpos,/*substring_querystart*/0,/*substring_queryend*/querylength,
10449 nmismatches_shortexon_left,
10450 /*prob*/2.0,/*left*/splicesites[best_left_j]-leftpos,
10451 query_compress,querylength,/*plusp*/false,genestrand,
10452 /*sensedir*/SENSE_ANTI,segmentm->chrnum,
10453 segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10454
10455 debug4k(printf("shortexon with donor at %d (%llu) ... amb_acceptor at %d (%llu)\n",
10456 j2,(unsigned long long) splicesites[j2],j1,(unsigned long long) plicesites[j1]));
10457 shortexon = Substring_new_shortexon(/*acceptor_coord*/splicesites[j2],/*acceptor_knowni*/j2,
10458 /*donor_coord*/splicesites[j1],/*donor_knowni*/j1,
10459 /*acceptor_pos*/rightpos,/*donor_pos*/leftpos,nmismatches_shortexon_middle,
10460 /*acceptor_prob*/2.0,/*donor_prob*/2.0,
10461 /*left*/segmentm_left,query_compress,
10462 querylength,/*plusp*/false,genestrand,
10463 sensedir,/*acceptor_ambp*/true,/*donor_ambp*/false,
10464 segmentm->chrnum,segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10465
10466 if (shortexon == NULL || acceptor == NULL) {
10467 if (shortexon != NULL) Substring_free(&shortexon);
10468 if (acceptor != NULL) Substring_free(&acceptor);
10469 } else {
10470 debug4k(printf("ambp_left false, ambp_right true: New splice at left %llu\n",
10471 (unsigned long long) segmentm_left));
10472 ambcoords_donor = lookup_splicesites(&probs_donor,splicesites_i_right,splicesites);
10473 amb_length_donor = querylength - rightpos /*- nmismatches_shortexon_right*/;
10474 segmentm->usedp = true;
10475 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),/*donor*/NULL,acceptor,shortexon,
10476 Doublelist_max(probs_donor),Substring_siteA_prob(shortexon),
10477 Substring_siteD_prob(shortexon),Substring_chimera_prob(acceptor),
10478 amb_length_donor,/*amb_length_acceptor*/0,
10479 ambcoords_donor,/*ambcoords_acceptor*/NULL,
10480 /*amb_knowni_donor*/splicesites_i_right,/*amb_knowni_acceptor*/NULL,
10481 /*amb_nmismatches_donor*/nmismatches_list_right,/*amb_nmismatches_acceptor*/NULL,
10482 /*amb_probs_donor*/probs_donor,/*amb_probs_acceptor*/NULL,
10483 /*copy_donor_p*/false,/*copy_acceptor_p*/false,/*copy_shortexon_p*/false,
10484 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
10485 Doublelist_free(&probs_donor);
10486 #ifdef LARGE_GENOMES
10487 Uint8list_free(&ambcoords_donor);
10488 #else
10489 Uintlist_free(&ambcoords_donor);
10490 #endif
10491 }
10492
10493 } else { /* ambp_left == false && ambp_right == false */
10494 best_left_j = Intlist_head(splicesites_i_left);
10495 best_right_j = Intlist_head(splicesites_i_right);
10496 acceptor = Substring_new_acceptor(/*acceptor_coord*/splicesites[best_left_j],/*acceptor_knowni*/best_left_j,
10497 /*splice_pos*/leftpos,/*substring_querystart*/0,/*substring_queryend*/querylength,
10498 nmismatches_shortexon_left,
10499 /*prob*/2.0,/*left*/splicesites[best_left_j]-leftpos,
10500 query_compress,querylength,/*plusp*/false,genestrand,
10501 /*sensedir*/SENSE_ANTI,segmentm->chrnum,
10502 segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10503
10504 shortexon = Substring_new_shortexon(/*acceptor_coord*/splicesites[j2],/*acceptor_knowni*/j2,
10505 /*donor_coord*/splicesites[j1],/*donor_knowni*/j1,
10506 /*acceptor_pos*/rightpos,/*donor_pos*/leftpos,
10507 nmismatches_shortexon_middle,/*acceptor_prob*/2.0,/*donor_prob*/2.0,
10508 /*left*/segmentm_left,query_compress,
10509 querylength,/*plusp*/false,genestrand,
10510 sensedir,/*acceptor_ambp*/false,/*donor_ambp*/false,
10511 segmentm->chrnum,segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10512
10513 donor = Substring_new_donor(/*donor_coord*/splicesites[best_right_j],/*donor_knowni*/best_right_j,
10514 /*splice_pos*/rightpos,/*substring_querystart*/0,/*substring_queryend*/querylength,
10515 nmismatches_shortexon_right,
10516 /*prob*/2.0,/*left*/splicesites[best_right_j]-rightpos,
10517 query_compress,querylength,/*plusp*/false,genestrand,
10518 /*sensedir*/SENSE_ANTI,segmentm->chrnum,
10519 segmentm->chroffset,segmentm->chrhigh,segmentm->chrlength);
10520
10521 if (acceptor == NULL || shortexon == NULL || donor == NULL) {
10522 if (acceptor != NULL) Substring_free(&acceptor);
10523 if (shortexon != NULL) Substring_free(&shortexon);
10524 if (donor != NULL) Substring_free(&donor);
10525 } else {
10526 debug4k(printf("New shortexon at left %llu\n",(unsigned long long) segmentm_left));
10527 segmentm->usedp = true;
10528 hits = List_push(hits,(void *) Stage3end_new_shortexon(&(*found_score),donor,acceptor,shortexon,
10529 Substring_chimera_prob(donor),Substring_siteA_prob(shortexon),
10530 Substring_siteD_prob(shortexon),Substring_chimera_prob(acceptor),
10531 /*amb_length_donor*/0,/*amb_length_acceptor*/0,
10532 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
10533 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
10534 /*amb_nmismatches_donor*/NULL,/*amb_nmismatches_acceptor*/NULL,
10535 /*amb_probs_donor*/NULL,/*amb_probs_acceptor*/NULL,
10536 /*copy_donor_p*/false,/*copy_acceptor_p*/false,/*copy_shortexon_p*/false,
10537 splicing_penalty,querylength,sensedir,/*sarrayp*/false));
10538 }
10539 }
10540 Intlist_free(&nmismatches_list_right);
10541 Intlist_free(&splicesites_i_right);
10542 }
10543 }
10544 }
10545 Intlist_free(&nmismatches_list_left);
10546 Intlist_free(&splicesites_i_left);
10547 }
10548 }
10549 }
10550 debug4k(printf("End of case 2\n"));
10551 }
10552 /* End of known splicesites, segment i */
10553 }
10554 }
10555
10556 debug4k(printf("Finished find_known_doublesplices with %d hits\n",List_length(hits)));
10557 return hits;
10558 }
10559 #endif
10560
10561
105629550
105639551 static void
105649552 find_spliceends_shortend (List_T **shortend_donors, List_T **shortend_antidonors,
105799567 int nmismatches, jstart, jend, j;
105809568 int splice_pos;
105819569
10582 #ifdef HAVE_ALLOCA
10583 int *mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
10584 #else
10585 int mismatch_positions[MAX_READLENGTH+1];
10586 #endif
10587
105889570 int nmismatches_left, nmismatches_right;
105899571 int *floors_from_neg3, *floors_to_pos3;
105909572 int sensedir;
105929574 int splice_pos_start, splice_pos_end;
105939575 #ifdef DEBUG4E
105949576 int i;
9577 #endif
9578
9579 int *mismatch_positions;
9580
9581 #ifdef HAVE_ALLOCA
9582 if (querylength <= MAX_STACK_READLENGTH) {
9583 mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
9584 } else {
9585 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
9586 }
9587 #else
9588 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
105959589 #endif
105969590
105979591 debug4e(printf("Entering find_spliceends_shortend with %d anchor segments\n",nanchors));
106839677 sensedir,segment->chrnum,segment->chroffset,
106849678 segment->chrhigh,segment->chrlength)) != NULL) {
106859679 debug4e(printf("=> %s donor: known at %d (%d mismatches)\n",
10686 plusp == true ? "plus" : "minus",Substring_chimera_pos(hit),nmismatches));
9680 plusp == true ? "plus" : "minus",Substring_siteD_pos(hit),nmismatches));
106879681 (*shortend_donors)[nmismatches] = List_push((*shortend_donors)[nmismatches],(void *) hit);
106889682 }
106899683
107009694 sensedir,segment->chrnum,segment->chroffset,
107019695 segment->chrhigh,segment->chrlength)) != NULL) {
107029696 debug4e(printf("=> %s antiacceptor : known at %d (%d mismatches)\n",
10703 plusp == true ? "plus" : "minus",Substring_chimera_pos(hit),nmismatches));
9697 plusp == true ? "plus" : "minus",Substring_siteA_pos(hit),nmismatches));
107049698 (*shortend_antiacceptors)[nmismatches] = List_push((*shortend_antiacceptors)[nmismatches],(void *) hit);
107059699 }
107069700 }
107729766 sensedir,segment->chrnum,segment->chroffset,
107739767 segment->chrhigh,segment->chrlength)) != NULL) {
107749768 debug4e(printf("=> %s acceptor: known at %d (%d mismatches)\n",
10775 plusp == true ? "plus" : "minus",Substring_chimera_pos(hit),nmismatches));
9769 plusp == true ? "plus" : "minus",Substring_siteA_pos(hit),nmismatches));
107769770 (*shortend_acceptors)[nmismatches] = List_push((*shortend_acceptors)[nmismatches],(void *) hit);
107779771 }
107789772
107899783 sensedir,segment->chrnum,segment->chroffset,
107909784 segment->chrhigh,segment->chrlength)) != NULL) {
107919785 debug4e(printf("=> %s antidonor: known at %d (%d mismatches)\n",
10792 plusp == true ? "plus" : "minus",Substring_chimera_pos(hit),nmismatches));
9786 plusp == true ? "plus" : "minus",Substring_siteD_pos(hit),nmismatches));
107939787 (*shortend_antidonors)[nmismatches] = List_push((*shortend_antidonors)[nmismatches],(void *) hit);
107949788 }
107959789 }
107999793 }
108009794 }
108019795
9796 #ifdef HAVE_ALLOCA
9797 if (querylength <= MAX_STACK_READLENGTH) {
9798 FREEA(mismatch_positions);
9799 } else {
9800 FREE(mismatch_positions);
9801 }
9802 #else
9803 FREE(mismatch_positions);
9804 #endif
9805
108029806 return;
108039807 }
108049808
108269830 int *floors_from_neg3, *floors_to_pos3;
108279831
108289832 int splice_pos_start, splice_pos_end;
9833 int *mismatch_positions;
108299834
108309835 #ifdef HAVE_ALLOCA
10831 int *mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
10832 #else
10833 int mismatch_positions[MAX_READLENGTH+1];
10834 #endif
10835
9836 if (querylength <= MAX_STACK_READLENGTH) {
9837 mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
9838 } else {
9839 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
9840 }
9841 #else
9842 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
9843 #endif
108369844
108379845 debug4e(printf("Entering find_spliceends_distant_dna with %d anchor segments\n",nanchors));
108389846
109009908 querylength,/*plusp*/true,genestrand,
109019909 segment->chrnum,segment->chroffset,
109029910 segment->chrhigh,segment->chrlength)) != NULL) {
10903 debug4e(printf("=> plus startfrag: at %d (%d mismatches)\n",Substring_chimera_pos(hit),nmismatches));
9911 debug4e(printf("=> plus startfrag: at %d (%d mismatches)\n",Substring_siteN_pos(hit),nmismatches));
109049912 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
109059913 (*distant_startfrags)[nmismatches] = List_push((*distant_startfrags)[nmismatches],(void *) hit);
109069914 }
109569964 querylength,/*plusp*/true,genestrand,
109579965 segment->chrnum,segment->chroffset,
109589966 segment->chrhigh,segment->chrlength)) != NULL) {
10959 debug4e(printf("=> plus endfrag: at %d (%d mismatches)\n",Substring_chimera_pos(hit),nmismatches));
9967 debug4e(printf("=> plus endfrag: at %d (%d mismatches)\n",Substring_siteN_pos(hit),nmismatches));
109609968 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
109619969 (*distant_endfrags)[nmismatches] = List_push((*distant_endfrags)[nmismatches],(void *) hit);
109629970 }
109719979 }
109729980 }
109739981 }
9982
9983 #ifdef HAVE_ALLOCA
9984 if (querylength <= MAX_STACK_READLENGTH) {
9985 FREEA(mismatch_positions);
9986 } else {
9987 FREE(mismatch_positions);
9988 }
9989 #else
9990 FREE(mismatch_positions);
9991 #endif
109749992
109759993 return;
109769994 }
1099910017 int *floors_from_neg3, *floors_to_pos3;
1100010018
1100110019 int splice_pos_start, splice_pos_end;
10020 int *mismatch_positions;
1100210021
1100310022 #ifdef HAVE_ALLOCA
11004 int *mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
11005 #else
11006 int mismatch_positions[MAX_READLENGTH+1];
10023 if (querylength <= MAX_STACK_READLENGTH) {
10024 mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
10025 } else {
10026 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
10027 }
10028 #else
10029 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
1100710030 #endif
1100810031
1100910032
1107310096 querylength,/*plusp*/false,genestrand,
1107410097 segment->chrnum,segment->chroffset,
1107510098 segment->chrhigh,segment->chrlength)) != NULL) {
11076 debug4e(printf("=> minus endfrag: at %d (%d mismatches)\n",Substring_chimera_pos(hit),nmismatches));
10099 debug4e(printf("=> minus endfrag: at %d (%d mismatches)\n",Substring_siteN_pos(hit),nmismatches));
1107710100 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1107810101 (*distant_endfrags)[nmismatches] = List_push((*distant_endfrags)[nmismatches],(void *) hit);
1107910102 }
1112910152 querylength,/*plusp*/false,genestrand,
1113010153 segment->chrnum,segment->chroffset,
1113110154 segment->chrhigh,segment->chrlength)) != NULL) {
11132 debug4e(printf("=> minus startfrag: at %d (%d mismatches)\n",Substring_chimera_pos(hit),nmismatches));
10155 debug4e(printf("=> minus startfrag: at %d (%d mismatches)\n",Substring_siteN_pos(hit),nmismatches));
1113310156 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1113410157 (*distant_startfrags)[nmismatches] = List_push((*distant_startfrags)[nmismatches],(void *) hit);
1113510158 }
1114310166 }
1114410167 }
1114510168 }
10169
10170 #ifdef HAVE_ALLOCA
10171 if (querylength <= MAX_STACK_READLENGTH) {
10172 FREEA(mismatch_positions);
10173 } else {
10174 FREE(mismatch_positions);
10175 }
10176 #else
10177 FREE(mismatch_positions);
10178 #endif
1114610179
1114710180 return;
1114810181 }
1117910212 int sensedir;
1118010213
1118110214 int splice_pos_start, splice_pos_end;
11182
11183 #ifdef HAVE_ALLOCA
11184 int *mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
11185 int *segment_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
11186 int *segment_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
11187 int *segment_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
11188 int *segment_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
11189 int *segment_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
11190 int *segment_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
11191 int *segment_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
11192 int *segment_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
11193 int *positions_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
11194 int *knowni_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
11195 #else
11196 int mismatch_positions[MAX_READLENGTH+1];
11197 int segment_donor_knownpos[MAX_READLENGTH+1], segment_acceptor_knownpos[MAX_READLENGTH+1];
11198 int segment_antidonor_knownpos[MAX_READLENGTH+1], segment_antiacceptor_knownpos[MAX_READLENGTH+1];
11199 int segment_donor_knowni[MAX_READLENGTH+1], segment_acceptor_knowni[MAX_READLENGTH+1];
11200 int segment_antidonor_knowni[MAX_READLENGTH+1], segment_antiacceptor_knowni[MAX_READLENGTH+1];
11201 int positions_alloc[MAX_READLENGTH+1];
11202 int knowni_alloc[MAX_READLENGTH+1];
11203 #endif
11204
1120510215 int segment_donor_nknown, segment_acceptor_nknown, segment_antidonor_nknown, segment_antiacceptor_nknown;
1120610216 int donori_nsites, acceptorj_nsites, antiacceptori_nsites, antidonorj_nsites;
1120710217 int *donori_positions, *acceptorj_positions, *antiacceptori_positions, *antidonorj_positions;
1120810218 int *donori_knowni, *acceptorj_knowni, *antiacceptori_knowni, *antidonorj_knowni;
1120910219
10220 int *mismatch_positions;
10221 int *segment_donor_knownpos, *segment_acceptor_knownpos, *segment_antidonor_knownpos, *segment_antiacceptor_knownpos,
10222 *segment_donor_knowni, *segment_acceptor_knowni, *segment_antidonor_knowni, *segment_antiacceptor_knowni;
10223 int *positions_alloc, *knowni_alloc;
10224
10225 #ifdef HAVE_ALLOCA
10226 if (querylength <= MAX_STACK_READLENGTH) {
10227 mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
10228 segment_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
10229 segment_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
10230 segment_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
10231 segment_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
10232 segment_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
10233 segment_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
10234 segment_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
10235 segment_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
10236 positions_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
10237 knowni_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
10238 } else {
10239 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
10240 segment_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
10241 segment_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
10242 segment_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
10243 segment_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
10244 segment_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
10245 segment_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
10246 segment_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
10247 segment_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
10248 positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
10249 knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
10250 }
10251 #else
10252 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
10253 segment_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
10254 segment_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
10255 segment_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
10256 segment_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
10257 segment_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
10258 segment_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
10259 segment_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
10260 segment_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
10261 positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
10262 knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
10263 #endif
1121010264
1121110265 debug4e(printf("Entering find_spliceends_distant_rna with %d anchor segments\n",nanchors));
1121210266
1121910273 assert(segment->diagonal != (Univcoord_T) -1);
1122010274
1122110275 segment_left = segment->diagonal - querylength; /* FORMULA: Corresponds to querypos 0 */
11222 last_querypos = segment->querypos3 + index1part;
10276 if ((first_querypos = segment->querypos5 - (index1interval - 1)) < 0) {
10277 first_querypos = 0;
10278 }
10279 if ((last_querypos = segment->querypos3 + index1part + (index1interval - 1)) > querylength) {
10280 last_querypos = querylength;
10281 }
1122310282
1122410283 debug4e(printf("find_spliceends_distant_rna: Checking up to %d mismatches at diagonal %llu (querypos %d..%d) - querylength %d = %llu, floors %d and %d, plusp %d\n",
1122510284 max_mismatches_allowed,(unsigned long long) segment->diagonal,
1123810297 /* Find splices on genomic right */
1123910298 if (plusp) {
1124010299 /* ? require that floors_from_neg3[segment->querypos5] <= max_mismatches_allowed */
11241 if (segment->querypos5 < index1part && last_querypos < query_lastpos) {
10300 if (first_querypos < index1part && last_querypos < query_lastpos) {
1124210301 /* genomic left anchor */
1124310302 debug4e(printf("Searching genomic right: plus genomic left anchor\n"));
1124410303 nmismatches_left = Genome_mismatches_left(mismatch_positions,max_mismatches_allowed,
1124610305 #if 0
1124710306 /*pos5*/0,/*pos3*/querylength,
1124810307 #else
11249 /*pos5 (was 0)*/segment->querypos5,/*pos3*/querylength,
10308 /*pos5 (was 0)*/first_querypos,/*pos3*/querylength,
1125010309 #endif
1125110310 plusp,genestrand);
1125210311 debug4e(
1126110320 #if 0
1126210321 splice_pos_start = index1part;
1126310322 #else
11264 splice_pos_start = segment->querypos5;
10323 splice_pos_start = first_querypos;
1126510324 #endif
1126610325 if (nmismatches_left <= max_mismatches_allowed) {
1126710326 splice_pos_end = querylength - 1;
1126910328 splice_pos_end = querylength - 1;
1127010329 }
1127110330
11272 } else if (segment->querypos5 > index1part && last_querypos > query_lastpos) {
10331 } else if (first_querypos > index1part && last_querypos > query_lastpos) {
1127310332 /* genomic right anchor. No need to find splices on genomic right */
1127410333 debug4e(printf("Searching genomic right: plus genomic right anchor\n"));
1127510334 splice_pos_start = querylength;
1127610335 splice_pos_end = 0;
1127710336
11278 } else if (segment->querypos5 > index1part && last_querypos < query_lastpos &&
10337 } else if (first_querypos > index1part && last_querypos < query_lastpos &&
1127910338 segment->spliceable_low_p == true) {
1128010339 /* middle anchor */
1128110340 debug4e(printf("Searching genomic right: plus middle anchor\n"));
1128210341 nmismatches_left = Genome_mismatches_left(mismatch_positions,max_mismatches_allowed,
1128310342 query_compress,/*left*/segment_left,
11284 /*pos5*/segment->querypos5,/*pos3*/querylength,
10343 /*pos5*/first_querypos,/*pos3*/querylength,
1128510344 plusp,genestrand);
1128610345 debug4e(
1128710346 printf("%d mismatches on left (%d allowed) at:",
1129210351 printf("\n");
1129310352 );
1129410353
11295 splice_pos_start = segment->querypos5;
10354 splice_pos_start = first_querypos;
1129610355 if (nmismatches_left <= max_mismatches_allowed) {
1129710356 splice_pos_end = querylength - 1;
1129810357 } else if ((splice_pos_end = mismatch_positions[nmismatches_left-1]) > querylength - 1) {
1130810367
1130910368 } else {
1131010369 /* ? require that floors_to_pos3[segment->querypos3] <= max_mismatches_allowed */
11311 if (segment->querypos5 < index1part && last_querypos < query_lastpos) {
10370 if (first_querypos < index1part && last_querypos < query_lastpos) {
1131210371 /* genomic right anchor. No need to find splices on genomic right */
1131310372 debug4e(printf("Searching genomic right: minus genomic right anchor\n"));
1131410373 splice_pos_start = querylength;
1131510374 splice_pos_end = 0;
1131610375
11317 } else if (segment->querypos5 > index1part && last_querypos > query_lastpos) {
10376 } else if (first_querypos > index1part && last_querypos > query_lastpos) {
1131810377 /* genomic left anchor */
1131910378 debug4e(printf("Searching genomic right: minus genomic left anchor\n"));
1132010379 nmismatches_left = Genome_mismatches_left(mismatch_positions,max_mismatches_allowed,
1134610405 splice_pos_end = querylength - 1;
1134710406 }
1134810407
11349 } else if (segment->querypos5 > index1part && last_querypos < query_lastpos &&
10408 } else if (first_querypos > index1part && last_querypos < query_lastpos &&
1135010409 segment->spliceable_low_p == true) {
1135110410 /* middle anchor */
1135210411 debug4e(printf("Searching genomic right: minus middle anchor\n"));
1135310412 nmismatches_left = Genome_mismatches_left(mismatch_positions,max_mismatches_allowed,
1135410413 query_compress,/*left*/segment_left,
11355 /*pos5*/querylength - segment->querypos3 - index1part,
10414 /*pos5*/querylength - last_querypos,
1135610415 /*pos3*/querylength,plusp,genestrand);
1135710416 debug4e(
1135810417 printf("%d mismatches on left (%d allowed) at:",
1144110500 (unsigned long long) segment_left,splice_pos,nmismatches,splice_pos_end));
1144210501
1144310502 if ((hit = Substring_new_donor(/*donor_coord*/segment_left + splice_pos,/*donor_knowni*/donori_knowni[i],
11444 splice_pos,/*substring_querystart*/segment->querypos5,
10503 splice_pos,/*substring_querystart*/first_querypos,
1144510504 /*substring_queryend*/last_querypos,
1144610505 nmismatches,/*prob*/2.0,/*left*/segment_left,query_compress,
1144710506 querylength,plusp,genestrand,
1144910508 segment->chrhigh,segment->chrlength)) != NULL) {
1145010509 debug4e(printf("=> %s donor: %f at %d (%d mismatches) %d..%d\n",
1145110510 plusp == true ? "plus" : "minus",Maxent_hr_donor_prob(segment_left + splice_pos,segment->chroffset),
11452 Substring_chimera_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
10511 Substring_siteD_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
1145310512 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1145410513 (*distant_donors)[nmismatches] = List_push((*distant_donors)[nmismatches],(void *) hit);
1145510514 }
1146210521 debug4e(printf("Novel donor for segment at %llu, splice_pos %d (%d mismatches), stopi = %d\n",
1146310522 (unsigned long long) segment_left,splice_pos,nmismatches,splice_pos_end));
1146410523 if ((hit = Substring_new_donor(/*donor_coord*/segment_left + splice_pos,/*donor_knowni*/-1,
11465 splice_pos,/*substring_querystart*/segment->querypos5,
10524 splice_pos,/*substring_querystart*/first_querypos,
1146610525 /*substring_queryend*/last_querypos,
1146710526 nmismatches,prob,/*left*/segment_left,query_compress,
1146810527 querylength,plusp,genestrand,
1146910528 sensedir,segment->chrnum,segment->chroffset,
1147010529 segment->chrhigh,segment->chrlength)) != NULL) {
1147110530 debug4e(printf("=> %s donor: %f at %d (%d mismatches) %d..%d\n",
11472 plusp == true ? "plus" : "minus",prob,Substring_chimera_pos(hit),nmismatches,
10531 plusp == true ? "plus" : "minus",prob,Substring_siteD_pos(hit),nmismatches,
1147310532 Substring_querystart(hit),Substring_queryend(hit)));
1147410533 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1147510534 (*distant_donors)[nmismatches] = List_push((*distant_donors)[nmismatches],(void *) hit);
1152210581 debug4e(printf("Known antiacceptor for segment at %llu, splice_pos %d (%d mismatches), stopi = %d\n",
1152310582 (unsigned long long) segment_left,splice_pos,nmismatches,splice_pos_end));
1152410583 if ((hit = Substring_new_acceptor(/*acceptor_coord*/segment_left + splice_pos,/*acceptor_knowni*/antiacceptori_knowni[i],
11525 splice_pos,/*substring_querystart*/segment->querypos5,
10584 splice_pos,/*substring_querystart*/first_querypos,
1152610585 /*substring_queryend*/last_querypos,
1152710586 nmismatches,/*prob*/2.0,/*left*/segment_left,query_compress,
1152810587 querylength,plusp,genestrand,
1153010589 segment->chrhigh,segment->chrlength)) != NULL) {
1153110590 debug4e(printf("=> %s antiacceptor : %f at %d (%d mismatches) %d..%d\n",
1153210591 plusp == true ? "plus" : "minus",Maxent_hr_antiacceptor_prob(segment_left + splice_pos,segment->chroffset),
11533 Substring_chimera_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
10592 Substring_siteA_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
1153410593 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1153510594 (*distant_antiacceptors)[nmismatches] = List_push((*distant_antiacceptors)[nmismatches],(void *) hit);
1153610595 }
1154310602 debug4e(printf("Novel antiacceptor for segment at %llu, splice_pos %d (%d mismatches), stopi = %d\n",
1154410603 (unsigned long long) segment_left,splice_pos,nmismatches,splice_pos_end));
1154510604 if ((hit = Substring_new_acceptor(/*acceptor_coord*/segment_left + splice_pos,/*acceptor_knowni*/-1,
11546 splice_pos,/*substring_querystart*/segment->querypos5,
10605 splice_pos,/*substring_querystart*/first_querypos,
1154710606 /*substring_queryend*/last_querypos,
1154810607 nmismatches,prob,/*left*/segment_left,query_compress,
1154910608 querylength,plusp,genestrand,
1155010609 sensedir,segment->chrnum,segment->chroffset,
1155110610 segment->chrhigh,segment->chrlength)) != NULL) {
1155210611 debug4e(printf("=> %s antiacceptor : %f at %d (%d mismatches) %d..%d\n",
11553 plusp == true ? "plus" : "minus",prob,Substring_chimera_pos(hit),nmismatches,
10612 plusp == true ? "plus" : "minus",prob,Substring_siteA_pos(hit),nmismatches,
1155410613 Substring_querystart(hit),Substring_queryend(hit)));
1155510614 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1155610615 (*distant_antiacceptors)[nmismatches] = List_push((*distant_antiacceptors)[nmismatches],(void *) hit);
1156710626 /* Find splices on genomic left */
1156810627 if (plusp) {
1156910628 /* ? require that floors_to_pos3[segment->querypos3] <= max_mismatches_allowed */
11570 if (segment->querypos5 < index1part && last_querypos < query_lastpos) {
10629 if (first_querypos < index1part && last_querypos < query_lastpos) {
1157110630 /* genomic left anchor. No need to find splices on genomic left. */
1157210631 debug4e(printf("Searching genomic left: plus genomic left anchor\n"));
1157310632 splice_pos_start = querylength;
1157410633 splice_pos_end = 0;
1157510634
11576 } else if (segment->querypos5 > index1part && last_querypos > query_lastpos) {
10635 } else if (first_querypos > index1part && last_querypos > query_lastpos) {
1157710636 /* genomic right anchor */
1157810637 debug4e(printf("Searching genomic left: plus genomic right anchor\n"));
1157910638 nmismatches_right = Genome_mismatches_right(mismatch_positions,max_mismatches_allowed,
1160310662 splice_pos_start = 1;
1160410663 }
1160510664
11606 } else if (segment->querypos5 > index1part && last_querypos < query_lastpos &&
10665 } else if (first_querypos > index1part && last_querypos < query_lastpos &&
1160710666 segment->spliceable_high_p == true) {
1160810667 /* middle anchor */
1160910668 debug4e(printf("Searching genomic left: plus middle anchor\n"));
1163510694
1163610695 } else {
1163710696 /* ? require that floors_from_neg3[segment->querypos5] <= max_mismatches_allowed */
11638 if (segment->querypos5 < index1part && last_querypos < query_lastpos) {
10697 if (first_querypos < index1part && last_querypos < query_lastpos) {
1163910698 /* genomic right anchor */
1164010699 debug4e(printf("Searching genomic left: minus genomic right anchor\n"));
1164110700 nmismatches_right = Genome_mismatches_right(mismatch_positions,max_mismatches_allowed,
1164310702 #if 0
1164410703 /*pos5*/0,/*pos3*/querylength,
1164510704 #else
11646 /*pos5*/0,/*pos3 (was querylength)*/querylength - segment->querypos5,
10705 /*pos5*/0,/*pos3 (was querylength)*/querylength - first_querypos,
1164710706 #endif
1164810707 plusp,genestrand);
1164910708 debug4e(
1165710716 #if 0
1165810717 splice_pos_end = query_lastpos;
1165910718 #else
11660 splice_pos_end = querylength - segment->querypos5;
10719 splice_pos_end = querylength - first_querypos;
1166110720 #endif
1166210721 if (nmismatches_right <= max_mismatches_allowed) {
1166310722 splice_pos_start = 1;
1166510724 splice_pos_start = 1;
1166610725 }
1166710726
11668 } else if (segment->querypos5 > index1part && last_querypos > query_lastpos) {
10727 } else if (first_querypos > index1part && last_querypos > query_lastpos) {
1166910728 /* genomic left anchor. No need to find splices on genomic left. */
1167010729 debug4e(printf("Searching genomic left: minus genomic left anchor\n"));
1167110730 splice_pos_start = querylength;
1167210731 splice_pos_end = 0;
1167310732
11674 } else if (segment->querypos5 > index1part && last_querypos < query_lastpos &&
10733 } else if (first_querypos > index1part && last_querypos < query_lastpos &&
1167510734 segment->spliceable_high_p == true) {
1167610735 /* middle anchor */
1167710736 debug4e(printf("Searching genomic left: minus middle anchor\n"));
1167810737 nmismatches_right = Genome_mismatches_right(mismatch_positions,max_mismatches_allowed,
1167910738 query_compress,/*left*/segment_left,
11680 /*pos5*/0,/*pos3*/querylength - segment->querypos5,
10739 /*pos5*/0,/*pos3*/querylength - first_querypos,
1168110740 plusp,genestrand);
1168210741 debug4e(
1168310742 printf("%d mismatches on right (%d allowed) at:",nmismatches_right,max_mismatches_allowed);
1168710746 printf("\n");
1168810747 );
1168910748
11690 splice_pos_end = querylength - segment->querypos5;
10749 splice_pos_end = querylength - first_querypos;
1169110750 if (nmismatches_right <= max_mismatches_allowed) {
1169210751 splice_pos_start = 1;
1169310752 } else if ((splice_pos_start = mismatch_positions[nmismatches_right-1]) < 1) {
1177310832 segment->chrhigh,segment->chrlength)) != NULL) {
1177410833 debug4e(printf("=> %s acceptor: %f at %d (%d mismatches) %d..%d\n",
1177510834 plusp == true ? "plus" : "minus",Maxent_hr_acceptor_prob(segment_left + splice_pos,segment->chroffset),
11776 Substring_chimera_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
10835 Substring_siteA_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
1177710836 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1177810837 (*distant_acceptors)[nmismatches] = List_push((*distant_acceptors)[nmismatches],(void *) hit);
1177910838 }
1179310852 sensedir,segment->chrnum,segment->chroffset,
1179410853 segment->chrhigh,segment->chrlength)) != NULL) {
1179510854 debug4e(printf("=> %s acceptor: %f at %d (%d mismatches) %d..%d\n",
11796 plusp == true ? "plus" : "minus",prob,Substring_chimera_pos(hit),nmismatches,
10855 plusp == true ? "plus" : "minus",prob,Substring_siteA_pos(hit),nmismatches,
1179710856 Substring_querystart(hit),Substring_queryend(hit)));
1179810857 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1179910858 (*distant_acceptors)[nmismatches] = List_push((*distant_acceptors)[nmismatches],(void *) hit);
1185410913 segment->chrhigh,segment->chrlength)) != NULL) {
1185510914 debug4e(printf("=> %s antidonor: %f at %d (%d mismatches) %d..%d\n",
1185610915 plusp == true ? "plus" : "minus",Maxent_hr_antidonor_prob(segment_left + splice_pos,segment->chroffset),
11857 Substring_chimera_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
10916 Substring_siteD_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
1185810917 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1185910918 (*distant_antidonors)[nmismatches] = List_push((*distant_antidonors)[nmismatches],(void *) hit);
1186010919 }
1187410933 sensedir,segment->chrnum,segment->chroffset,
1187510934 segment->chrhigh,segment->chrlength)) != NULL) {
1187610935 debug4e(printf("=> %s antidonor: %f at %d (%d mismatches) %d..%d\n",
11877 plusp == true ? "plus" : "minus",prob,Substring_chimera_pos(hit),nmismatches,
10936 plusp == true ? "plus" : "minus",prob,Substring_siteD_pos(hit),nmismatches,
1187810937 Substring_querystart(hit),Substring_queryend(hit)));
1187910938 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1188010939 (*distant_antidonors)[nmismatches] = List_push((*distant_antidonors)[nmismatches],(void *) hit);
1188810947 }
1188910948 }
1189010949 }
10950
10951 #ifdef HAVE_ALLOCA
10952 if (querylength <= MAX_STACK_READLENGTH) {
10953 FREEA(mismatch_positions);
10954 FREEA(segment_donor_knownpos);
10955 FREEA(segment_acceptor_knownpos);
10956 FREEA(segment_antidonor_knownpos);
10957 FREEA(segment_antiacceptor_knownpos);
10958 FREEA(segment_donor_knowni);
10959 FREEA(segment_acceptor_knowni);
10960 FREEA(segment_antidonor_knowni);
10961 FREEA(segment_antiacceptor_knowni);
10962 FREEA(positions_alloc);
10963 FREEA(knowni_alloc);
10964 } else {
10965 FREE(mismatch_positions);
10966 FREE(segment_donor_knownpos);
10967 FREE(segment_acceptor_knownpos);
10968 FREE(segment_antidonor_knownpos);
10969 FREE(segment_antiacceptor_knownpos);
10970 FREE(segment_donor_knowni);
10971 FREE(segment_acceptor_knowni);
10972 FREE(segment_antidonor_knowni);
10973 FREE(segment_antiacceptor_knowni);
10974 FREE(positions_alloc);
10975 FREE(knowni_alloc);
10976 #else
10977 FREE(mismatch_positions);
10978 FREE(segment_donor_knownpos);
10979 FREE(segment_acceptor_knownpos);
10980 FREE(segment_antidonor_knownpos);
10981 FREE(segment_antiacceptor_knownpos);
10982 FREE(segment_donor_knowni);
10983 FREE(segment_acceptor_knowni);
10984 FREE(segment_antidonor_knowni);
10985 FREE(segment_antiacceptor_knowni);
10986 FREE(positions_alloc);
10987 FREE(knowni_alloc);
10988 #endif
1189110989
1189210990 return;
1189310991 }
1192111019 Substring_T hit;
1192211020 Univcoord_T segment_left;
1192311021 int nmismatches, j, i;
11924 int splice_pos, last_querypos;
11022 int splice_pos, first_querypos, last_querypos;
1192511023 double prob;
1192611024
1192711025 int nmismatches_left, nmismatches_right;
1192911027 int sensedir;
1193011028
1193111029 int splice_pos_start, splice_pos_end;
11932
11933 #ifdef HAVE_ALLOCA
11934 int *mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
11935 int *segment_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
11936 int *segment_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
11937 int *segment_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
11938 int *segment_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
11939 int *segment_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
11940 int *segment_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
11941 int *segment_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
11942 int *segment_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
11943 int *positions_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
11944 int *knowni_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
11945 #else
11946 int mismatch_positions[MAX_READLENGTH+1];
11947 int segment_donor_knownpos[MAX_READLENGTH+1], segment_acceptor_knownpos[MAX_READLENGTH+1];
11948 int segment_antidonor_knownpos[MAX_READLENGTH+1], segment_antiacceptor_knownpos[MAX_READLENGTH+1];
11949 int segment_donor_knowni[MAX_READLENGTH+1], segment_acceptor_knowni[MAX_READLENGTH+1];
11950 int segment_antidonor_knowni[MAX_READLENGTH+1], segment_antiacceptor_knowni[MAX_READLENGTH+1];
11951 int positions_alloc[MAX_READLENGTH+1];
11952 int knowni_alloc[MAX_READLENGTH+1];
11953 #endif
11954
1195511030 int segment_donor_nknown, segment_acceptor_nknown, segment_antidonor_nknown, segment_antiacceptor_nknown;
1195611031 int donori_nsites, acceptorj_nsites, antiacceptori_nsites, antidonorj_nsites;
1195711032 int *donori_positions, *acceptorj_positions, *antiacceptori_positions, *antidonorj_positions;
1195811033 int *donori_knowni, *acceptorj_knowni, *antiacceptori_knowni, *antidonorj_knowni;
1195911034
11035 int *mismatch_positions;
11036 int *segment_donor_knownpos, *segment_acceptor_knownpos, *segment_antidonor_knownpos, *segment_antiacceptor_knownpos,
11037 *segment_donor_knowni, *segment_acceptor_knowni, *segment_antidonor_knowni, *segment_antiacceptor_knowni;
11038 int *positions_alloc, *knowni_alloc;
11039
11040 #ifdef HAVE_ALLOCA
11041 if (querylength <= MAX_STACK_READLENGTH) {
11042 mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
11043 segment_donor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
11044 segment_acceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
11045 segment_antidonor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
11046 segment_antiacceptor_knownpos = (int *) ALLOCA((querylength+1)*sizeof(int));
11047 segment_donor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
11048 segment_acceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
11049 segment_antidonor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
11050 segment_antiacceptor_knowni = (int *) ALLOCA((querylength+1)*sizeof(int));
11051 positions_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
11052 knowni_alloc = (int *) ALLOCA((querylength+1)*sizeof(int));
11053 } else {
11054 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
11055 segment_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
11056 segment_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
11057 segment_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
11058 segment_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
11059 segment_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
11060 segment_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
11061 segment_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
11062 segment_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
11063 positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
11064 knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
11065 }
11066 #else
11067 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
11068 segment_donor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
11069 segment_acceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
11070 segment_antidonor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
11071 segment_antiacceptor_knownpos = (int *) MALLOC((querylength+1)*sizeof(int));
11072 segment_donor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
11073 segment_acceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
11074 segment_antidonor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
11075 segment_antiacceptor_knowni = (int *) MALLOC((querylength+1)*sizeof(int));
11076 positions_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
11077 knowni_alloc = (int *) MALLOC((querylength+1)*sizeof(int));
11078 #endif
1196011079
1196111080 debug4e(printf("Entering find_spliceends_distant_rna with %d anchor segments\n",nanchors));
1196211081
1196911088 assert(segment->diagonal != (Univcoord_T) -1);
1197011089
1197111090 segment_left = segment->diagonal - querylength; /* FORMULA: Corresponds to querypos 0 */
11972 last_querypos = segment->querypos3 + index1part;
11973 assert(last_querypos <= querylength);
11091 if ((first_querypos = segment->querypos5 - (index1interval - 1)) < 0) {
11092 first_querypos = 0;
11093 }
11094 if ((last_querypos = segment->querypos3 + index1part + (index1interval - 1)) > querylength) {
11095 last_querypos = querylength;
11096 }
1197411097
1197511098 debug4e(printf("find_spliceends_distant_rna: Checking up to %d mismatches at diagonal %llu (querypos %d..%d) - querylength %d = %llu, floors %d and %d, plusp %d\n",
1197611099 max_mismatches_allowed,(unsigned long long) segment->diagonal,
1199011113 if (plusp) {
1199111114 /* ? require that floors_from_neg3[segment->querypos5] <= max_mismatches_allowed */
1199211115 if (last_querypos < query_lastpos &&
11993 (segment->querypos5 < index1part || segment->spliceable_low_p == true)) {
11116 (first_querypos < index1part || segment->spliceable_low_p == true)) {
1199411117 /* genomic left anchor or middle anchor */
11995 debug4e(printf("Searching genomic right: plus genomic left anchor or middle anchor\n"));
11118 debug4e(printf("Searching genomic right: plus genomic left anchor or middle anchor: %d..%d\n",
11119 segment->querypos5,querylength));
1199611120 nmismatches_left = Genome_mismatches_left(mismatch_positions,max_mismatches_allowed,
1199711121 query_compress,/*left*/segment_left,
1199811122 /*pos5*/segment->querypos5,/*pos3*/querylength,
1200611130 printf("\n");
1200711131 );
1200811132
12009 splice_pos_start = segment->querypos5 + 1;
11133 splice_pos_start = first_querypos + 1;
1201011134 if (nmismatches_left <= max_mismatches_allowed) {
1201111135 splice_pos_end = querylength - 1 - 1;
1201211136 } else if ((splice_pos_end = mismatch_positions[nmismatches_left-1]) > querylength - 1 - 1) {
1202211146
1202311147 } else {
1202411148 /* ? require that floors_to_pos3[segment->querypos3] <= max_mismatches_allowed */
12025 if (segment->querypos5 > index1part &&
11149 if (first_querypos > index1part &&
1202611150 (last_querypos > query_lastpos || segment->spliceable_low_p == true)) {
1202711151 /* genomic left anchor or middle anchor */
12028 debug4e(printf("Searching genomic right: minus genomic left anchor or middle anchor\n"));
11152 debug4e(printf("Searching genomic right: minus genomic left anchor or middle anchor: %d..%d\n",
11153 querylength - segment->querypos3 - index1part,querylength));
1202911154 nmismatches_left = Genome_mismatches_left(mismatch_positions,max_mismatches_allowed,
1203011155 query_compress,/*left*/segment_left,
12031 /*pos5*/querylength - last_querypos,
11156 /*pos5*/querylength - segment->querypos3 - index1part,
1203211157 /*pos3*/querylength,plusp,genestrand);
1203311158 debug4e(
1203411159 printf("%d mismatches on left (%d allowed) at:",
1211911244 (unsigned long long) segment_left,splice_pos,nmismatches,splice_pos_end));
1212011245
1212111246 if ((hit = Substring_new_donor(/*donor_coord*/segment_left + splice_pos,/*donor_knowni*/donori_knowni[i],
12122 splice_pos,/*substring_querystart*/segment->querypos5,
11247 splice_pos,/*substring_querystart*/first_querypos,
1212311248 /*substring_queryend*/last_querypos,
1212411249 nmismatches,/*prob*/2.0,/*left*/segment_left,query_compress,
1212511250 querylength,plusp,genestrand,
1212711252 segment->chrhigh,segment->chrlength)) != NULL) {
1212811253 debug4e(printf("=> %s donor: %f at %d (%d mismatches) %d..%d\n",
1212911254 plusp == true ? "plus" : "minus",Maxent_hr_donor_prob(segment_left + splice_pos,segment->chroffset),
12130 Substring_chimera_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
11255 Substring_siteD_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
1213111256 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1213211257 (*distant_donors)[nmismatches] = List_push((*distant_donors)[nmismatches],(void *) hit);
1213311258 }
1214011265 debug4e(printf("Novel donor for segment at %llu, splice_pos %d (%d mismatches), stopi = %d\n",
1214111266 (unsigned long long) segment_left,splice_pos,nmismatches,splice_pos_end));
1214211267 if ((hit = Substring_new_donor(/*donor_coord*/segment_left + splice_pos,/*donor_knowni*/-1,
12143 splice_pos,/*substring_querystart*/segment->querypos5,
12144 /*substring_queryend*/last_querypos,
11268 splice_pos,/*substring_querystart*/first_querypos,
11269 /*substring_queryend, as last_querypos*/querylength,
1214511270 nmismatches,prob,/*left*/segment_left,query_compress,
1214611271 querylength,plusp,genestrand,
1214711272 sensedir,segment->chrnum,segment->chroffset,
1214811273 segment->chrhigh,segment->chrlength)) != NULL) {
1214911274 debug4e(printf("=> %s donor: %f at %d (%d mismatches) %d..%d\n",
12150 plusp == true ? "plus" : "minus",prob,Substring_chimera_pos(hit),nmismatches,
11275 plusp == true ? "plus" : "minus",prob,Substring_siteD_pos(hit),nmismatches,
1215111276 Substring_querystart(hit),Substring_queryend(hit)));
1215211277 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1215311278 (*distant_donors)[nmismatches] = List_push((*distant_donors)[nmismatches],(void *) hit);
1220211327 debug4e(printf("Known antiacceptor for segment at %llu, splice_pos %d (%d mismatches), stopi = %d\n",
1220311328 (unsigned long long) segment_left,splice_pos,nmismatches,splice_pos_end));
1220411329 if ((hit = Substring_new_acceptor(/*acceptor_coord*/segment_left + splice_pos,/*acceptor_knowni*/antiacceptori_knowni[i],
12205 splice_pos,/*substring_querystart*/segment->querypos5,
11330 splice_pos,/*substring_querystart*/first_querypos,
1220611331 /*substring_queryend*/last_querypos,
1220711332 nmismatches,/*prob*/2.0,/*left*/segment_left,query_compress,
1220811333 querylength,plusp,genestrand,
1221011335 segment->chrhigh,segment->chrlength)) != NULL) {
1221111336 debug4e(printf("=> %s antiacceptor : %f at %d (%d mismatches) %d..%d\n",
1221211337 plusp == true ? "plus" : "minus",Maxent_hr_antiacceptor_prob(segment_left + splice_pos,segment->chroffset),
12213 Substring_chimera_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
11338 Substring_siteA_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
1221411339 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1221511340 (*distant_antiacceptors)[nmismatches] = List_push((*distant_antiacceptors)[nmismatches],(void *) hit);
1221611341 }
1222311348 debug4e(printf("Novel antiacceptor for segment at %llu, splice_pos %d (%d mismatches), stopi = %d\n",
1222411349 (unsigned long long) segment_left,splice_pos,nmismatches,splice_pos_end));
1222511350 if ((hit = Substring_new_acceptor(/*acceptor_coord*/segment_left + splice_pos,/*acceptor_knowni*/-1,
12226 splice_pos,/*substring_querystart*/segment->querypos5,
11351 splice_pos,/*substring_querystart*/first_querypos,
1222711352 /*substring_queryend*/last_querypos,
1222811353 nmismatches,prob,/*left*/segment_left,query_compress,
1222911354 querylength,plusp,genestrand,
1223011355 sensedir,segment->chrnum,segment->chroffset,
1223111356 segment->chrhigh,segment->chrlength)) != NULL) {
1223211357 debug4e(printf("=> %s antiacceptor : %f at %d (%d mismatches) %d..%d\n",
12233 plusp == true ? "plus" : "minus",prob,Substring_chimera_pos(hit),nmismatches,
11358 plusp == true ? "plus" : "minus",prob,Substring_siteA_pos(hit),nmismatches,
1223411359 Substring_querystart(hit),Substring_queryend(hit)));
1223511360 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1223611361 (*distant_antiacceptors)[nmismatches] = List_push((*distant_antiacceptors)[nmismatches],(void *) hit);
1224711372 /* Find splices on genomic left */
1224811373 if (plusp) {
1224911374 /* ? require that floors_to_pos3[segment->querypos3] <= max_mismatches_allowed */
12250 if (segment->querypos5 > index1part &&
11375 if (first_querypos > index1part &&
1225111376 (last_querypos > query_lastpos || segment->spliceable_high_p == true)) {
1225211377 /* genomic right anchor or middle anchor */
12253 debug4e(printf("Searching genomic left: plus genomic right anchor or middle anchor\n"));
11378 debug4e(printf("Searching genomic left: plus genomic right anchor or middle anchor: %d..%d\n",
11379 0,segment->querypos3 + index1part));
1225411380 nmismatches_right = Genome_mismatches_right(mismatch_positions,max_mismatches_allowed,
1225511381 query_compress,/*left*/segment_left,
12256 /*pos5*/0,/*pos3*/last_querypos,
11382 /*pos5*/0,/*pos3*/segment->querypos3 + index1part,
1225711383 plusp,genestrand);
1225811384 debug4e(
1225911385 printf("%d mismatches on right (%d allowed) at:",nmismatches_right,max_mismatches_allowed);
1228011406 } else {
1228111407 /* ? require that floors_from_neg3[segment->querypos5] <= max_mismatches_allowed */
1228211408 if (last_querypos < query_lastpos &&
12283 (segment->querypos5 < index1part || segment->spliceable_high_p == true)) {
11409 (first_querypos < index1part || segment->spliceable_high_p == true)) {
1228411410 /* genomic right anchor or middle anchor*/
12285 debug4e(printf("Searching genomic left: minus genomic right anchor or middle anchor\n"));
11411 debug4e(printf("Searching genomic left: minus genomic right anchor or middle anchor: %d..%d\n",
11412 0,querylength - segment->querypos5));
1228611413 nmismatches_right = Genome_mismatches_right(mismatch_positions,max_mismatches_allowed,
1228711414 query_compress,/*left*/segment_left,
1228811415 /*pos5*/0,/*pos3*/querylength - segment->querypos5,
1229511422 printf("\n");
1229611423 );
1229711424
12298 splice_pos_end = querylength - segment->querypos5 - 1 - 1;
11425 splice_pos_end = querylength - first_querypos - 1 - 1;
1229911426 if (nmismatches_right <= max_mismatches_allowed) {
1230011427 splice_pos_start = 1;
1230111428 } else if ((splice_pos_start = mismatch_positions[nmismatches_right-1]) < 1) {
1237511502 debug4e(printf("Known acceptor for segment at %llu, splice_pos %d (%d mismatches), stopi = %d\n",
1237611503 (unsigned long long) segment_left,splice_pos,nmismatches,splice_pos_start));
1237711504 if ((hit = Substring_new_acceptor(/*acceptor_coord*/segment_left + splice_pos,/*acceptor_knowni*/acceptorj_knowni[i],
12378 splice_pos,/*substring_querystart*/segment->querypos5,
11505 splice_pos,/*substring_querystart*/first_querypos,
1237911506 /*substring_queryend*/last_querypos,
1238011507 nmismatches,/*prob*/2.0,/*left*/segment_left,query_compress,
1238111508 querylength,plusp,genestrand,
1238311510 segment->chrhigh,segment->chrlength)) != NULL) {
1238411511 debug4e(printf("=> %s acceptor: %f at %d (%d mismatches) %d..%d\n",
1238511512 plusp == true ? "plus" : "minus",Maxent_hr_acceptor_prob(segment_left + splice_pos,segment->chroffset),
12386 Substring_chimera_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
11513 Substring_siteA_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
1238711514 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1238811515 (*distant_acceptors)[nmismatches] = List_push((*distant_acceptors)[nmismatches],(void *) hit);
1238911516 }
1239611523 debug4e(printf("Novel acceptor for segment at %llu, splice_pos %d (%d mismatches), stopi = %d\n",
1239711524 (unsigned long long) segment_left,splice_pos,nmismatches,splice_pos_start));
1239811525 if ((hit = Substring_new_acceptor(/*acceptor_coord*/segment_left + splice_pos,/*acceptor_knowni*/-1,
12399 splice_pos,/*substring_querystart*/segment->querypos5,
11526 splice_pos,/*substring_querystart*/first_querypos,
1240011527 /*substring_queryend*/last_querypos,
1240111528 nmismatches,prob,/*left*/segment_left,query_compress,
1240211529 querylength,plusp,genestrand,
1240311530 sensedir,segment->chrnum,segment->chroffset,
1240411531 segment->chrhigh,segment->chrlength)) != NULL) {
1240511532 debug4e(printf("=> %s acceptor: %f at %d (%d mismatches) %d..%d\n",
12406 plusp == true ? "plus" : "minus",prob,Substring_chimera_pos(hit),nmismatches,
11533 plusp == true ? "plus" : "minus",prob,Substring_siteA_pos(hit),nmismatches,
1240711534 Substring_querystart(hit),Substring_queryend(hit)));
1240811535 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1240911536 (*distant_acceptors)[nmismatches] = List_push((*distant_acceptors)[nmismatches],(void *) hit);
1245811585 debug4e(printf("Known antidonor for segmenti at %llu, splice_pos %d (%d mismatches), stopi = %d\n",
1245911586 (unsigned long long) segment_left,splice_pos,nmismatches,splice_pos_start));
1246011587 if ((hit = Substring_new_donor(/*donor_coord*/segment_left + splice_pos,/*donor_knowni*/antidonorj_knowni[i],
12461 splice_pos,/*substring_querystart*/segment->querypos5,
11588 splice_pos,/*substring_querystart*/first_querypos,
1246211589 /*substring_queryend*/last_querypos,
1246311590 nmismatches,/*prob*/2.0,/*left*/segment_left,query_compress,
1246411591 querylength,plusp,genestrand,
1246611593 segment->chrhigh,segment->chrlength)) != NULL) {
1246711594 debug4e(printf("=> %s antidonor: %f at %d (%d mismatches) %d..%d\n",
1246811595 plusp == true ? "plus" : "minus",Maxent_hr_antidonor_prob(segment_left + splice_pos,segment->chroffset),
12469 Substring_chimera_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
11596 Substring_siteD_pos(hit),nmismatches,Substring_querystart(hit),Substring_queryend(hit)));
1247011597 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1247111598 (*distant_antidonors)[nmismatches] = List_push((*distant_antidonors)[nmismatches],(void *) hit);
1247211599 }
1247911606 debug4e(printf("Novel antidonor for segmenti at %llu, splice_pos %d (%d mismatches), stopi = %d\n",
1248011607 (unsigned long long) segment_left,splice_pos,nmismatches,splice_pos_start));
1248111608 if ((hit = Substring_new_donor(/*donor_coord*/segment_left + splice_pos,/*donor_knowni*/-1,
12482 splice_pos,/*substring_querystart*/segment->querypos5,
11609 splice_pos,/*substring_querystart*/first_querypos,
1248311610 /*substring_queryend*/last_querypos,
1248411611 nmismatches,prob,/*left*/segment_left,query_compress,
1248511612 querylength,plusp,genestrand,
1248611613 sensedir,segment->chrnum,segment->chroffset,
1248711614 segment->chrhigh,segment->chrlength)) != NULL) {
1248811615 debug4e(printf("=> %s antidonor: %f at %d (%d mismatches) %d..%d\n",
12489 plusp == true ? "plus" : "minus",prob,Substring_chimera_pos(hit),nmismatches,
11616 plusp == true ? "plus" : "minus",prob,Substring_siteD_pos(hit),nmismatches,
1249011617 Substring_querystart(hit),Substring_queryend(hit)));
1249111618 debug4e(printf("q: %s\ng: %s\n",queryptr,gbuffer));
1249211619 (*distant_antidonors)[nmismatches] = List_push((*distant_antidonors)[nmismatches],(void *) hit);
1250011627 }
1250111628 }
1250211629 }
11630
11631 #ifdef HAVE_ALLOCA
11632 if (querylength <= MAX_STACK_READLENGTH) {
11633 FREEA(mismatch_positions);
11634 FREEA(segment_donor_knownpos);
11635 FREEA(segment_acceptor_knownpos);
11636 FREEA(segment_antidonor_knownpos);
11637 FREEA(segment_antiacceptor_knownpos);
11638 FREEA(segment_donor_knowni);
11639 FREEA(segment_acceptor_knowni);
11640 FREEA(segment_antidonor_knowni);
11641 FREEA(segment_antiacceptor_knowni);
11642 FREEA(positions_alloc);
11643 FREEA(knowni_alloc);
11644 } else {
11645 FREE(mismatch_positions);
11646 FREE(segment_donor_knownpos);
11647 FREE(segment_acceptor_knownpos);
11648 FREE(segment_antidonor_knownpos);
11649 FREE(segment_antiacceptor_knownpos);
11650 FREE(segment_donor_knowni);
11651 FREE(segment_acceptor_knowni);
11652 FREE(segment_antidonor_knowni);
11653 FREE(segment_antiacceptor_knowni);
11654 FREE(positions_alloc);
11655 FREE(knowni_alloc);
11656 }
11657 #else
11658 FREE(mismatch_positions);
11659 FREE(segment_donor_knownpos);
11660 FREE(segment_acceptor_knownpos);
11661 FREE(segment_antidonor_knownpos);
11662 FREE(segment_antiacceptor_knownpos);
11663 FREE(segment_donor_knowni);
11664 FREE(segment_acceptor_knowni);
11665 FREE(segment_antidonor_knowni);
11666 FREE(segment_antiacceptor_knowni);
11667 FREE(positions_alloc);
11668 FREE(knowni_alloc);
11669 #endif
1250311670
1250411671 return;
1250511672 }
1252411691 Univcoord_T segment_left;
1252511692 int nmismatches_left, nmismatches_right;
1252611693 Endtype_T start_endtype, end_endtype;
12527
12528 #ifdef HAVE_ALLOCA
12529 int *mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
12530 #else
12531 int mismatch_positions[MAX_READLENGTH+1];
12532 #endif
12533
1253411694 /* int *floors_from_neg3, *floors_to_pos3; */
1253511695 int max_terminal_length;
1253611696 int nterminals_left, nterminals_right, nterminals_middle;
1253811698 #ifdef DEBUG4T
1253911699 int i;
1254011700 #endif
11701
11702 int *mismatch_positions;
11703
11704 #ifdef HAVE_ALLOCA
11705 if (querylength <= MAX_STACK_READLENGTH) {
11706 mismatch_positions = (int *) ALLOCA((querylength+1)*sizeof(int));
11707 } else {
11708 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
11709 }
11710 #else
11711 mismatch_positions = (int *) MALLOC((querylength+1)*sizeof(int));
11712 #endif
11713
1254111714
1254211715 debug(printf("identify_terminals: Checking up to %d mismatches\n",max_mismatches_allowed));
1254311716
1298012153 minus_terminals_right = (List_T) NULL;
1298112154 }
1298212155
12156 #ifdef HAVE_ALLOCA
12157 if (querylength <= MAX_STACK_READLENGTH) {
12158 FREEA(mismatch_positions);
12159 } else {
12160 FREE(mismatch_positions);
12161 }
12162 #else
12163 FREE(mismatch_positions);
12164 #endif
12165
1298312166 return List_append(plus_terminals_middle,
1298412167 List_append(plus_terminals_left,
1298512168 List_append(plus_terminals_right,
1316112344 intragenic_splice_p (Chrpos_T splicedistance, Substring_T donor, Substring_T acceptor) {
1316212345 int knowni;
1316312346
13164 if ((knowni = Substring_splicesites_knowni(donor)) >= 0) {
12347 if ((knowni = Substring_splicesitesD_knowni(donor)) >= 0) {
1316512348 if (splicedists[knowni] >= splicedistance) {
1316612349 return true;
1316712350 }
1316812351 }
1316912352
13170 if ((knowni = Substring_splicesites_knowni(acceptor)) >= 0) {
12353 if ((knowni = Substring_splicesitesA_knowni(acceptor)) >= 0) {
1317112354 if (splicedists[knowni] >= splicedistance) {
1317212355 return true;
1317312356 }
1323812421 (unsigned long long) Substring_genomicstart(endfrag),
1323912422 Substring_querystart(endfrag),Substring_queryend(endfrag)));
1324012423
13241 if ((pos = Substring_chimera_pos(startfrag)) < min_endlength_1) {
12424 if ((pos = Substring_siteN_pos(startfrag)) < min_endlength_1) {
1324212425 debug4ld(printf("chimera_pos of startfrag < min_endlength_1\n"));
1324312426 p = p->rest;
1324412427 } else if (pos > querylength - min_endlength_2) {
1324512428 debug4ld(printf("chimera_pos of startfrag > querylength - min_endlength_2\n"));
1324612429 p = p->rest;
13247 } else if (pos < Substring_chimera_pos(endfrag)) {
13248 debug4ld(printf("chimera_pos of startfrag %d < chimera_pos of endfrag %d\n",pos,Substring_chimera_pos(endfrag)));
12430 } else if (pos < Substring_siteN_pos(endfrag)) {
12431 debug4ld(printf("chimera_pos of startfrag %d < chimera_pos of endfrag %d\n",pos,Substring_siteN_pos(endfrag)));
1324912432 p = p->rest;
13250 } else if (pos > Substring_chimera_pos(endfrag)) {
13251 debug4ld(printf("chimera_pos of startfrag %d > chimera_pos of endfrag %d\n",pos,Substring_chimera_pos(endfrag)));
12433 } else if (pos > Substring_siteN_pos(endfrag)) {
12434 debug4ld(printf("chimera_pos of startfrag %d > chimera_pos of endfrag %d\n",pos,Substring_siteN_pos(endfrag)));
1325212435 q = q->rest;
1325312436 } else {
1325412437 /* Generate all pairs at this splice_pos */
1325512438 qsave = q;
13256 while (p != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_chimera_pos(((Substring_T) p->first)) == pos) {
12439 while (p != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_siteN_pos(((Substring_T) p->first)) == pos) {
1325712440 startfrag = (Substring_T) p->first;
1325812441 debug4ld(printf("startfrag at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(startfrag),pos));
1325912442 q = qsave;
13260 while (q != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_chimera_pos(((Substring_T) q->first)) == pos) {
12443 while (q != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_siteN_pos(((Substring_T) q->first)) == pos) {
1326112444 endfrag = (Substring_T) q->first;
1326212445 debug4ld(printf("endfrag at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(endfrag),pos));
1326312446 if (Substring_genomicstart(endfrag) == Substring_genomicstart(startfrag)) {
1333312516 (unsigned long long) Substring_genomicstart(endfrag),
1333412517 Substring_querystart(endfrag),Substring_queryend(endfrag)));
1333512518
13336 if ((pos = Substring_chimera_pos(startfrag)) < min_endlength_1) {
12519 if ((pos = Substring_siteN_pos(startfrag)) < min_endlength_1) {
1333712520 debug4ld(printf("chimera_pos of startfrag < min_endlength_1\n"));
1333812521 p = p->rest;
1333912522 } else if (pos > querylength - min_endlength_2) {
1334012523 debug4ld(printf("chimera_pos of startfrag > querylength - min_endlength_2\n"));
1334112524 p = p->rest;
13342 } else if (pos < Substring_chimera_pos(endfrag)) {
13343 debug4ld(printf("chimera_pos of startfrag %d < chimera_pos of endfrag %d\n",pos,Substring_chimera_pos(endfrag)));
12525 } else if (pos < Substring_siteN_pos(endfrag)) {
12526 debug4ld(printf("chimera_pos of startfrag %d < chimera_pos of endfrag %d\n",pos,Substring_siteN_pos(endfrag)));
1334412527 p = p->rest;
13345 } else if (pos > Substring_chimera_pos(endfrag)) {
13346 debug4ld(printf("chimera_pos of startfrag %d > chimera_pos of endfrag %d\n",pos,Substring_chimera_pos(endfrag)));
12528 } else if (pos > Substring_siteN_pos(endfrag)) {
12529 debug4ld(printf("chimera_pos of startfrag %d > chimera_pos of endfrag %d\n",pos,Substring_siteN_pos(endfrag)));
1334712530 q = q->rest;
1334812531 } else {
1334912532 qsave = q;
13350 while (p != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_chimera_pos(((Substring_T) p->first)) == pos) {
12533 while (p != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_siteN_pos(((Substring_T) p->first)) == pos) {
1335112534 startfrag = (Substring_T) p->first;
1335212535 debug4ld(printf("startfrag at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(startfrag),pos));
1335312536 q = qsave;
13354 while (q != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_chimera_pos(((Substring_T) q->first)) == pos) {
12537 while (q != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_siteN_pos(((Substring_T) q->first)) == pos) {
1335512538 endfrag = (Substring_T) q->first;
1335612539 debug4ld(printf("endfrag at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(endfrag),pos));
1335712540 if (Substring_genomicstart(endfrag) == Substring_genomicstart(startfrag)) {
1343312616 (unsigned long long) Substring_genomicstart(endfrag),
1343412617 Substring_querystart(endfrag),Substring_queryend(endfrag)));
1343512618
13436 if ((pos = Substring_chimera_pos(startfrag)) < min_endlength_1) {
12619 if ((pos = Substring_siteN_pos(startfrag)) < min_endlength_1) {
1343712620 debug4ld(printf("chimera_pos of startfrag < min_endlength_1\n"));
1343812621 p = p->rest;
1343912622 } else if (pos > querylength - min_endlength_2) {
1344012623 debug4ld(printf("chimera_pos of startfrag > querylength - min_endlength_2\n"));
1344112624 p = p->rest;
13442 } else if (pos < Substring_chimera_pos(endfrag)) {
13443 debug4ld(printf("chimera_pos of startfrag %d < chimera_pos of endfrag %d\n",pos,Substring_chimera_pos(endfrag)));
12625 } else if (pos < Substring_siteN_pos(endfrag)) {
12626 debug4ld(printf("chimera_pos of startfrag %d < chimera_pos of endfrag %d\n",pos,Substring_siteN_pos(endfrag)));
1344412627 p = p->rest;
13445 } else if (pos > Substring_chimera_pos(endfrag)) {
13446 debug4ld(printf("chimera_pos of startfrag %d > chimera_pos of endfrag %d\n",pos,Substring_chimera_pos(endfrag)));
12628 } else if (pos > Substring_siteN_pos(endfrag)) {
12629 debug4ld(printf("chimera_pos of startfrag %d > chimera_pos of endfrag %d\n",pos,Substring_siteN_pos(endfrag)));
1344712630 q = q->rest;
1344812631 } else {
1344912632 qsave = q;
13450 while (p != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_chimera_pos(((Substring_T) p->first)) == pos) {
12633 while (p != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_siteN_pos(((Substring_T) p->first)) == pos) {
1345112634 startfrag = (Substring_T) p->first;
1345212635 debug4ld(printf("startfrag at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(startfrag),pos));
1345312636 q = qsave;
13454 while (q != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_chimera_pos(((Substring_T) q->first)) == pos) {
12637 while (q != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_siteN_pos(((Substring_T) q->first)) == pos) {
1345512638 endfrag = (Substring_T) q->first;
1345612639 debug4ld(printf("endfrag at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(endfrag),pos));
1345712640 if (Substring_chrnum(startfrag) != Substring_chrnum(endfrag)) {
1349712680 (unsigned long long) Substring_genomicstart(endfrag),
1349812681 Substring_querystart(endfrag),Substring_queryend(endfrag)));
1349912682
13500 if ((pos = Substring_chimera_pos(startfrag)) < min_endlength_1) {
12683 if ((pos = Substring_siteN_pos(startfrag)) < min_endlength_1) {
1350112684 debug4ld(printf("chimera_pos of startfrag < min_endlength_1\n"));
1350212685 p = p->rest;
1350312686 } else if (pos > querylength - min_endlength_2) {
1350412687 debug4ld(printf("chimera_pos of startfrag > querylength - min_endlength_2\n"));
1350512688 p = p->rest;
13506 } else if (pos < Substring_chimera_pos(endfrag)) {
13507 debug4ld(printf("chimera_pos of startfrag %d < chimera_pos of endfrag %d\n",pos,Substring_chimera_pos(endfrag)));
12689 } else if (pos < Substring_siteN_pos(endfrag)) {
12690 debug4ld(printf("chimera_pos of startfrag %d < chimera_pos of endfrag %d\n",pos,Substring_siteN_pos(endfrag)));
1350812691 p = p->rest;
13509 } else if (pos > Substring_chimera_pos(endfrag)) {
13510 debug4ld(printf("chimera_pos of startfrag %d > chimera_pos of endfrag %d\n",pos,Substring_chimera_pos(endfrag)));
12692 } else if (pos > Substring_siteN_pos(endfrag)) {
12693 debug4ld(printf("chimera_pos of startfrag %d > chimera_pos of endfrag %d\n",pos,Substring_siteN_pos(endfrag)));
1351112694 q = q->rest;
1351212695 } else {
1351312696 qsave = q;
13514 while (p != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_chimera_pos(((Substring_T) p->first)) == pos) {
12697 while (p != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_siteN_pos(((Substring_T) p->first)) == pos) {
1351512698 startfrag = (Substring_T) p->first;
1351612699 debug4ld(printf("startfrag at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(startfrag),pos));
1351712700 q = qsave;
13518 while (q != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_chimera_pos(((Substring_T) q->first)) == pos) {
12701 while (q != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_siteN_pos(((Substring_T) q->first)) == pos) {
1351912702 endfrag = (Substring_T) q->first;
1352012703 debug4ld(printf("endfrag at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(endfrag),pos));
1352112704 if (Substring_chrnum(startfrag) != Substring_chrnum(endfrag)) {
1362512808 (unsigned long long) Substring_genomicstart(acceptor),
1362612809 Substring_querystart(acceptor),Substring_queryend(acceptor)));
1362712810
13628 if ((pos = Substring_chimera_pos(donor)) < min_endlength_1) {
12811 if ((pos = Substring_siteD_pos(donor)) < min_endlength_1) {
1362912812 debug4ld(printf("chimera_pos of donor < min_endlength_1\n"));
1363012813 p = p->rest;
1363112814 } else if (pos > querylength - min_endlength_2) {
1363212815 debug4ld(printf("chimera_pos of donor > querylength - min_endlength_2\n"));
1363312816 p = p->rest;
13634 } else if (pos < Substring_chimera_pos(acceptor)) {
13635 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
12817 } else if (pos < Substring_siteA_pos(acceptor)) {
12818 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1363612819 p = p->rest;
13637 } else if (pos > Substring_chimera_pos(acceptor)) {
13638 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
12820 } else if (pos > Substring_siteA_pos(acceptor)) {
12821 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1363912822 q = q->rest;
1364012823 } else {
1364112824 /* Generate all pairs at this splice_pos */
1364212825 qsave = q;
13643 while (p != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_chimera_pos(((Substring_T) p->first)) == pos) {
12826 while (p != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_siteD_pos(((Substring_T) p->first)) == pos) {
1364412827 donor = (Substring_T) p->first;
1364512828 debug4ld(printf("donor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(donor),pos));
1364612829 q = qsave;
13647 while (q != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_chimera_pos(((Substring_T) q->first)) == pos) {
12830 while (q != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_siteA_pos(((Substring_T) q->first)) == pos) {
1364812831 acceptor = (Substring_T) q->first;
1364912832 debug4ld(printf("acceptor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(acceptor),pos));
1365012833 if (Substring_genomicstart(acceptor) == Substring_genomicstart(donor)) {
1367512858 if (shortdistancep) {
1367612859 *localsplicing = List_push(*localsplicing,
1367712860 (void *) Stage3end_new_splice(&(*found_score),nmismatches1,nmismatches2,
13678 donor,acceptor,Substring_chimera_prob(donor),Substring_chimera_prob(acceptor),distance,
12861 donor,acceptor,Substring_siteD_prob(donor),Substring_siteA_prob(acceptor),distance,
1367912862 /*shortdistancep*/true,localsplicing_penalty,querylength,
1368012863 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1368112864 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1368612869 } else if (*ndistantsplicepairs <= MAXCHIMERAPATHS) {
1368712870 distantsplicing = List_push(distantsplicing,
1368812871 (void *) Stage3end_new_splice(&(*found_score),nmismatches1,nmismatches2,
13689 donor,acceptor,Substring_chimera_prob(donor),Substring_chimera_prob(acceptor),distance,
12872 donor,acceptor,Substring_siteD_prob(donor),Substring_siteA_prob(acceptor),distance,
1369012873 /*shortdistancep*/false,distantsplicing_penalty,querylength,
1369112874 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1369212875 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1372012903 (unsigned long long) Substring_genomicstart(acceptor),
1372112904 Substring_querystart(acceptor),Substring_queryend(acceptor)));
1372212905
13723 if ((pos = Substring_chimera_pos(donor)) < min_endlength_1) {
12906 if ((pos = Substring_siteD_pos(donor)) < min_endlength_1) {
1372412907 debug4ld(printf("chimera_pos of donor < min_endlength_1\n"));
1372512908 p = p->rest;
1372612909 } else if (pos > querylength - min_endlength_2) {
1372712910 debug4ld(printf("chimera_pos of donor > querylength - min_endlength_2\n"));
1372812911 p = p->rest;
13729 } else if (pos < Substring_chimera_pos(acceptor)) {
13730 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
12912 } else if (pos < Substring_siteA_pos(acceptor)) {
12913 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1373112914 p = p->rest;
13732 } else if (pos > Substring_chimera_pos(acceptor)) {
13733 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
12915 } else if (pos > Substring_siteA_pos(acceptor)) {
12916 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1373412917 q = q->rest;
1373512918 } else {
1373612919 qsave = q;
13737 while (p != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_chimera_pos(((Substring_T) p->first)) == pos) {
12920 while (p != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_siteD_pos(((Substring_T) p->first)) == pos) {
1373812921 donor = (Substring_T) p->first;
1373912922 debug4ld(printf("donor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(donor),pos));
1374012923 q = qsave;
13741 while (q != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_chimera_pos(((Substring_T) q->first)) == pos) {
12924 while (q != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_siteA_pos(((Substring_T) q->first)) == pos) {
1374212925 acceptor = (Substring_T) q->first;
1374312926 debug4ld(printf("acceptor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(acceptor),pos));
1374412927 if (Substring_genomicstart(acceptor) == Substring_genomicstart(donor)) {
1376812951 if (shortdistancep) {
1376912952 *localsplicing = List_push(*localsplicing,
1377012953 (void *) Stage3end_new_splice(&(*found_score),nmismatches1,nmismatches2,
13771 donor,acceptor,Substring_chimera_prob(donor),Substring_chimera_prob(acceptor),distance,
12954 donor,acceptor,Substring_siteD_prob(donor),Substring_siteA_prob(acceptor),distance,
1377212955 /*shortdistancep*/true,localsplicing_penalty,querylength,
1377312956 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1377412957 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1377912962 } else if (*ndistantsplicepairs <= MAXCHIMERAPATHS) {
1378012963 distantsplicing = List_push(distantsplicing,
1378112964 (void *) Stage3end_new_splice(&(*found_score),nmismatches1,nmismatches2,
13782 donor,acceptor,Substring_chimera_prob(donor),Substring_chimera_prob(acceptor),distance,
12965 donor,acceptor,Substring_siteD_prob(donor),Substring_siteA_prob(acceptor),distance,
1378312966 /*shortdistancep*/false,distantsplicing_penalty,querylength,
1378412967 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1378512968 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1381212995 (unsigned long long) Substring_genomicstart(acceptor),
1381312996 Substring_querystart(acceptor),Substring_queryend(acceptor)));
1381412997
13815 if ((pos = Substring_chimera_pos(donor)) < min_endlength_2) {
12998 if ((pos = Substring_siteD_pos(donor)) < min_endlength_2) {
1381612999 debug4ld(printf("chimera_pos of donor < min_endlength_2\n"));
1381713000 p = p->rest;
1381813001 } else if (pos > querylength - min_endlength_1) {
1381913002 debug4ld(printf("chimera_pos of donor > querylength - min_endlength_1\n"));
1382013003 p = p->rest;
13821 } else if (pos < Substring_chimera_pos(acceptor)) {
13822 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
13004 } else if (pos < Substring_siteA_pos(acceptor)) {
13005 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1382313006 p = p->rest;
13824 } else if (pos > Substring_chimera_pos(acceptor)) {
13825 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
13007 } else if (pos > Substring_siteA_pos(acceptor)) {
13008 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1382613009 q = q->rest;
1382713010 } else {
1382813011 qsave = q;
13829 while (p != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_chimera_pos(((Substring_T) p->first)) == pos) {
13012 while (p != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_siteD_pos(((Substring_T) p->first)) == pos) {
1383013013 donor = (Substring_T) p->first;
1383113014 debug4ld(printf("donor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(donor),pos));
1383213015 q = qsave;
13833 while (q != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_chimera_pos(((Substring_T) q->first)) == pos) {
13016 while (q != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_siteA_pos(((Substring_T) q->first)) == pos) {
1383413017 acceptor = (Substring_T) q->first;
1383513018 debug4ld(printf("acceptor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(acceptor),pos));
1383613019 if (Substring_genomicstart(acceptor) == Substring_genomicstart(donor)) {
1386113044 if (shortdistancep) {
1386213045 *localsplicing = List_push(*localsplicing,
1386313046 (void *) Stage3end_new_splice(&(*found_score),nmismatches1,nmismatches2,
13864 donor,acceptor,Substring_chimera_prob(donor),Substring_chimera_prob(acceptor),distance,
13047 donor,acceptor,Substring_siteD_prob(donor),Substring_siteA_prob(acceptor),distance,
1386513048 /*shortdistancep*/true,localsplicing_penalty,querylength,
1386613049 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1386713050 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1387213055 } else if (*ndistantsplicepairs <= MAXCHIMERAPATHS) {
1387313056 distantsplicing = List_push(distantsplicing,
1387413057 (void *) Stage3end_new_splice(&(*found_score),nmismatches1,nmismatches2,
13875 donor,acceptor,Substring_chimera_prob(donor),Substring_chimera_prob(acceptor),distance,
13058 donor,acceptor,Substring_siteD_prob(donor),Substring_siteA_prob(acceptor),distance,
1387613059 /*shortdistancep*/false,distantsplicing_penalty,querylength,
1387713060 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1387813061 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1390513088 (unsigned long long) Substring_genomicstart(acceptor),
1390613089 Substring_querystart(acceptor),Substring_queryend(acceptor)));
1390713090
13908 if ((pos = Substring_chimera_pos(donor)) < min_endlength_2) {
13091 if ((pos = Substring_siteD_pos(donor)) < min_endlength_2) {
1390913092 debug4ld(printf("chimera_pos of donor < min_endlength_2\n"));
1391013093 p = p->rest;
1391113094 } else if (pos > querylength - min_endlength_1) {
1391213095 debug4ld(printf("chimera_pos of donor > querylength - min_endlength_1\n"));
1391313096 p = p->rest;
13914 } else if (pos < Substring_chimera_pos(acceptor)) {
13915 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
13097 } else if (pos < Substring_siteA_pos(acceptor)) {
13098 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1391613099 p = p->rest;
13917 } else if (pos > Substring_chimera_pos(acceptor)) {
13918 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
13100 } else if (pos > Substring_siteA_pos(acceptor)) {
13101 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1391913102 q = q->rest;
1392013103 } else {
1392113104 qsave = q;
1392213105
13923 while (p != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_chimera_pos(((Substring_T) p->first)) == pos) {
13106 while (p != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_siteD_pos(((Substring_T) p->first)) == pos) {
1392413107 donor = (Substring_T) p->first;
1392513108 debug4ld(printf("donor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(donor),pos));
1392613109 q = qsave;
13927 while (q != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_chimera_pos(((Substring_T) q->first)) == pos) {
13110 while (q != NULL /* && *nsplicepairs <= MAXCHIMERAPATHS */ && Substring_siteA_pos(((Substring_T) q->first)) == pos) {
1392813111 acceptor = (Substring_T) q->first;
1392913112 debug4ld(printf("acceptor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(acceptor),pos));
1393013113 if (Substring_genomicstart(acceptor) == Substring_genomicstart(donor)) {
1395413137 if (shortdistancep) {
1395513138 *localsplicing = List_push(*localsplicing,
1395613139 (void *) Stage3end_new_splice(&(*found_score),nmismatches1,nmismatches2,
13957 donor,acceptor,Substring_chimera_prob(donor),Substring_chimera_prob(acceptor),distance,
13140 donor,acceptor,Substring_siteD_prob(donor),Substring_siteA_prob(acceptor),distance,
1395813141 /*shortdistancep*/true,localsplicing_penalty,querylength,
1395913142 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1396013143 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1396513148 } else if (*ndistantsplicepairs <= MAXCHIMERAPATHS) {
1396613149 distantsplicing = List_push(distantsplicing,
1396713150 (void *) Stage3end_new_splice(&(*found_score),nmismatches1,nmismatches2,
13968 donor,acceptor,Substring_chimera_prob(donor),Substring_chimera_prob(acceptor),distance,
13151 donor,acceptor,Substring_siteD_prob(donor),Substring_siteA_prob(acceptor),distance,
1396913152 /*shortdistancep*/false,distantsplicing_penalty,querylength,
1397013153 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1397113154 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1400313186 (unsigned long long) Substring_genomicstart(acceptor),
1400413187 Substring_querystart(acceptor),Substring_queryend(acceptor)));
1400513188
14006 if ((pos = Substring_chimera_pos(donor)) < min_endlength_1) {
13189 if ((pos = Substring_siteD_pos(donor)) < min_endlength_1) {
1400713190 debug4ld(printf("chimera_pos of donor < min_endlength_1\n"));
1400813191 p = p->rest;
1400913192 } else if (pos > querylength - min_endlength_2) {
1401013193 debug4ld(printf("chimera_pos of donor > querylength - min_endlength_2\n"));
1401113194 p = p->rest;
14012 } else if (pos < Substring_chimera_pos(acceptor)) {
14013 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
13195 } else if (pos < Substring_siteA_pos(acceptor)) {
13196 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1401413197 p = p->rest;
14015 } else if (pos > Substring_chimera_pos(acceptor)) {
14016 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
13198 } else if (pos > Substring_siteA_pos(acceptor)) {
13199 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1401713200 q = q->rest;
1401813201 } else {
1401913202 qsave = q;
14020 while (p != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_chimera_pos(((Substring_T) p->first)) == pos) {
13203 while (p != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_siteD_pos(((Substring_T) p->first)) == pos) {
1402113204 donor = (Substring_T) p->first;
1402213205 debug4ld(printf("donor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(donor),pos));
1402313206 q = qsave;
14024 while (q != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_chimera_pos(((Substring_T) q->first)) == pos) {
13207 while (q != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_siteA_pos(((Substring_T) q->first)) == pos) {
1402513208 acceptor = (Substring_T) q->first;
1402613209 debug4ld(printf("acceptor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(acceptor),pos));
1402713210 if (Substring_chrnum(donor) != Substring_chrnum(acceptor)) {
1403713220 (unsigned long long) Substring_genomicstart(acceptor)));
1403813221 distantsplicing = List_push(distantsplicing,
1403913222 (void *) Stage3end_new_splice(&(*found_score),nmismatches1,nmismatches2,
14040 donor,acceptor,Substring_chimera_prob(donor),Substring_chimera_prob(acceptor),distance,
13223 donor,acceptor,Substring_siteD_prob(donor),Substring_siteA_prob(acceptor),distance,
1404113224 /*shortdistancep*/false,distantsplicing_penalty,querylength,
1404213225 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1404313226 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1406713250 (unsigned long long) Substring_genomicstart(acceptor),
1406813251 Substring_querystart(acceptor),Substring_queryend(acceptor)));
1406913252
14070 if ((pos = Substring_chimera_pos(donor)) < min_endlength_1) {
13253 if ((pos = Substring_siteD_pos(donor)) < min_endlength_1) {
1407113254 debug4ld(printf("chimera_pos of donor < min_endlength_1\n"));
1407213255 p = p->rest;
1407313256 } else if (pos > querylength - min_endlength_2) {
1407413257 debug4ld(printf("chimera_pos of donor > querylength - min_endlength_2\n"));
1407513258 p = p->rest;
14076 } else if (pos < Substring_chimera_pos(acceptor)) {
14077 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
13259 } else if (pos < Substring_siteA_pos(acceptor)) {
13260 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1407813261 p = p->rest;
14079 } else if (pos > Substring_chimera_pos(acceptor)) {
14080 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
13262 } else if (pos > Substring_siteA_pos(acceptor)) {
13263 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1408113264 q = q->rest;
1408213265 } else {
1408313266 qsave = q;
14084 while (p != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_chimera_pos(((Substring_T) p->first)) == pos) {
13267 while (p != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_siteD_pos(((Substring_T) p->first)) == pos) {
1408513268 donor = (Substring_T) p->first;
1408613269 debug4ld(printf("donor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(donor),pos));
1408713270 q = qsave;
14088 while (q != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_chimera_pos(((Substring_T) q->first)) == pos) {
13271 while (q != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_siteA_pos(((Substring_T) q->first)) == pos) {
1408913272 acceptor = (Substring_T) q->first;
1409013273 debug4ld(printf("acceptor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(acceptor),pos));
1409113274 if (Substring_chrnum(donor) != Substring_chrnum(acceptor)) {
1410113284 (unsigned long long) Substring_genomicstart(acceptor)));
1410213285 distantsplicing = List_push(distantsplicing,
1410313286 (void *) Stage3end_new_splice(&(*found_score),nmismatches1,nmismatches2,
14104 donor,acceptor,Substring_chimera_prob(donor),Substring_chimera_prob(acceptor),distance,
13287 donor,acceptor,Substring_siteD_prob(donor),Substring_siteA_prob(acceptor),distance,
1410513288 /*shortdistancep*/false,distantsplicing_penalty,querylength,
1410613289 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1410713290 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1413213315 (unsigned long long) Substring_genomicstart(acceptor),
1413313316 Substring_querystart(acceptor),Substring_queryend(acceptor)));
1413413317
14135 if ((pos = Substring_chimera_pos(donor)) < min_endlength_2) {
13318 if ((pos = Substring_siteD_pos(donor)) < min_endlength_2) {
1413613319 debug4ld(printf("chimera_pos of donor < min_endlength_2\n"));
1413713320 p = p->rest;
1413813321 } else if (pos > querylength - min_endlength_1) {
1413913322 debug4ld(printf("chimera_pos of donor > querylength - min_endlength_1\n"));
1414013323 p = p->rest;
14141 } else if (pos < Substring_chimera_pos(acceptor)) {
14142 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
13324 } else if (pos < Substring_siteA_pos(acceptor)) {
13325 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1414313326 p = p->rest;
14144 } else if (pos > Substring_chimera_pos(acceptor)) {
14145 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
13327 } else if (pos > Substring_siteA_pos(acceptor)) {
13328 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1414613329 q = q->rest;
1414713330 } else {
1414813331 qsave = q;
14149 while (p != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_chimera_pos(((Substring_T) p->first)) == pos) {
13332 while (p != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_siteD_pos(((Substring_T) p->first)) == pos) {
1415013333 donor = (Substring_T) p->first;
1415113334 debug4ld(printf("donor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(donor),pos));
1415213335 q = qsave;
14153 while (q != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_chimera_pos(((Substring_T) q->first)) == pos) {
13336 while (q != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_siteA_pos(((Substring_T) q->first)) == pos) {
1415413337 acceptor = (Substring_T) q->first;
1415513338 debug4ld(printf("acceptor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(acceptor),pos));
1415613339 if (Substring_chrnum(donor) != Substring_chrnum(acceptor)) {
1416613349 (unsigned long long) Substring_genomicstart(acceptor)));
1416713350 distantsplicing = List_push(distantsplicing,
1416813351 (void *) Stage3end_new_splice(&(*found_score),nmismatches1,nmismatches2,
14169 donor,acceptor,Substring_chimera_prob(donor),Substring_chimera_prob(acceptor),distance,
13352 donor,acceptor,Substring_siteD_prob(donor),Substring_siteA_prob(acceptor),distance,
1417013353 /*shortdistancep*/false,distantsplicing_penalty,querylength,
1417113354 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1417213355 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1419613379 (unsigned long long) Substring_genomicstart(acceptor),
1419713380 Substring_querystart(acceptor),Substring_queryend(acceptor)));
1419813381
14199 if ((pos = Substring_chimera_pos(donor)) < min_endlength_2) {
13382 if ((pos = Substring_siteD_pos(donor)) < min_endlength_2) {
1420013383 debug4ld(printf("chimera_pos of donor < min_endlength_2\n"));
1420113384 p = p->rest;
1420213385 } else if (pos > querylength - min_endlength_1) {
1420313386 debug4ld(printf("chimera_pos of donor > querylength - min_endlength_1\n"));
1420413387 p = p->rest;
14205 } else if (pos < Substring_chimera_pos(acceptor)) {
14206 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
13388 } else if (pos < Substring_siteA_pos(acceptor)) {
13389 debug4ld(printf("chimera_pos of donor %d < chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1420713390 p = p->rest;
14208 } else if (pos > Substring_chimera_pos(acceptor)) {
14209 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_chimera_pos(acceptor)));
13391 } else if (pos > Substring_siteA_pos(acceptor)) {
13392 debug4ld(printf("chimera_pos of donor %d > chimera_pos of acceptor %d\n",pos,Substring_siteA_pos(acceptor)));
1421013393 q = q->rest;
1421113394 } else {
1421213395 qsave = q;
14213 while (p != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_chimera_pos(((Substring_T) p->first)) == pos) {
13396 while (p != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_siteD_pos(((Substring_T) p->first)) == pos) {
1421413397 donor = (Substring_T) p->first;
1421513398 debug4ld(printf("donor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(donor),pos));
1421613399 q = qsave;
14217 while (q != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_chimera_pos(((Substring_T) q->first)) == pos) {
13400 while (q != NULL && *ndistantsplicepairs <= MAXCHIMERAPATHS && Substring_siteA_pos(((Substring_T) q->first)) == pos) {
1421813401 acceptor = (Substring_T) q->first;
1421913402 debug4ld(printf("acceptor at %llu, pos %d\n",(unsigned long long) Substring_genomicstart(acceptor),pos));
1422013403 if (Substring_chrnum(donor) != Substring_chrnum(acceptor)) {
1423013413 (unsigned long long) Substring_genomicstart(acceptor)));
1423113414 distantsplicing = List_push(distantsplicing,
1423213415 (void *) Stage3end_new_splice(&(*found_score),nmismatches1,nmismatches2,
14233 donor,acceptor,Substring_chimera_prob(donor),Substring_chimera_prob(acceptor),distance,
13416 donor,acceptor,Substring_siteD_prob(donor),Substring_siteA_prob(acceptor),distance,
1423413417 /*shortdistancep*/false,distantsplicing_penalty,querylength,
1423513418 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1423613419 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1431913502 /* End 1 */
1432013503 for (p = donors_plus[nmismatches]; p != NULL; p = p->rest) {
1432113504 donor = (Substring_T) p->first;
14322 support = Substring_chimera_pos(donor);
13505 support = Substring_siteD_pos(donor);
1432313506 endlength = querylength - support;
1432413507 chrhigh = Substring_chrhigh(donor);
1432513508
1433413517 debug4h(printf("End 1: short-overlap donor_plus: #%d:%u (%d mismatches) => searching right\n",
1433513518 Substring_chrnum(donor),(Chrpos_T) (leftbound-1-chroffset),Substring_nmismatches_whole(donor)));
1433613519
14337 if ((i = Substring_splicesites_knowni(donor)) >= 0) {
13520 if ((i = Substring_splicesitesD_knowni(donor)) >= 0) {
1433813521 origleft = Substring_genomicstart(donor);
1433913522 if ((splicesites_i =
1434013523 Splicetrie_find_right(&nmismatches_shortend,&nmismatches_list,i,
1434713530 ambcoords = lookup_splicesites(&probs_list,splicesites_i,splicesites);
1434813531 debug4h(amb_length = endlength /*- nmismatches_shortend*/);
1434913532 debug4h(printf("End 1: short-overlap donor_plus: Successful ambiguous from donor #%d with amb_length %d\n",
14350 Substring_splicesites_knowni(donor),amb_length));
13533 Substring_splicesitesD_knowni(donor),amb_length));
1435113534 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches,nmismatches_shortend,
14352 donor,/*acceptor*/NULL,Substring_chimera_prob(donor),Doublelist_max(probs_list),/*distance*/0U,
13535 donor,/*acceptor*/NULL,Substring_siteD_prob(donor),Doublelist_max(probs_list),/*distance*/0U,
1435313536 /*shortdistancep*/false,/*penalty*/0,querylength,
1435413537 /*ambcoords_donor*/NULL,ambcoords,
1435513538 /*ambi_donor*/NULL,/*ambi_acceptor*/splicesites_i,
1436713550 bestj = Intlist_head(splicesites_i);
1436813551 bestleft = splicesites[bestj] - support;
1436913552 if ((acceptor = Substring_new_acceptor(/*acceptor_coord*/splicesites[bestj],/*acceptor_knowni*/bestj,
14370 Substring_chimera_pos(donor),/*substring_querystart*/0,/*substring_queryend*/querylength,
13553 Substring_siteD_pos(donor),/*substring_querystart*/0,/*substring_queryend*/querylength,
1437113554 nmismatches_shortend,/*prob*/2.0,/*left*/bestleft,query_compress_fwd,
1437213555 querylength,/*plusp*/true,genestrand,/*sensedir*/SENSE_FORWARD,
1437313556 Substring_chrnum(donor),Substring_chroffset(donor),
1437413557 Substring_chrhigh(donor),Substring_chrlength(donor))) != NULL) {
1437513558 debug4h(printf("End 1: short-overlap donor_plus: Successful splice from donor #%d to acceptor #%d\n",
14376 Substring_splicesites_knowni(donor),Substring_splicesites_knowni(acceptor)));
13559 Substring_splicesitesD_knowni(donor),Substring_splicesitesA_knowni(acceptor)));
1437713560 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches,nmismatches_shortend,
14378 donor,acceptor,Substring_chimera_prob(donor),/*acceptor_prob*/2.0,/*distance*/bestleft-origleft,
13561 donor,acceptor,Substring_siteD_prob(donor),/*acceptor_prob*/2.0,/*distance*/bestleft-origleft,
1437913562 /*shortdistancep*/true,localsplicing_penalty,querylength,
1438013563 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1438113564 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1439513578 /* End 2 */
1439613579 for (p = acceptors_plus[nmismatches]; p != NULL; p = p->rest) {
1439713580 acceptor = (Substring_T) p->first;
14398 endlength = Substring_chimera_pos(acceptor);
13581 endlength = Substring_siteA_pos(acceptor);
1439913582 support = querylength - endlength;
1440013583 chroffset = Substring_chroffset(acceptor);
1440113584
1441013593 debug4h(printf("End 2: short-overlap acceptor_plus: #%d:%u (%d mismatches) => searching left\n",
1441113594 Substring_chrnum(acceptor),(Chrpos_T) (rightbound+1-chroffset),Substring_nmismatches_whole(acceptor)));
1441213595
14413 if ((i = Substring_splicesites_knowni(acceptor)) >= 0) {
13596 if ((i = Substring_splicesitesA_knowni(acceptor)) >= 0) {
1441413597 origleft = Substring_genomicstart(acceptor);
1441513598 if ((splicesites_i =
1441613599 Splicetrie_find_left(&nmismatches_shortend,&nmismatches_list,i,
1442313606 ambcoords = lookup_splicesites(&probs_list,splicesites_i,splicesites);
1442413607 debug4h(amb_length = endlength /*- nmismatches_shortend*/);
1442513608 debug4h(printf("End 2: short-overlap acceptor_plus: Successful ambiguous from acceptor #%d with amb_length %d\n",
14426 Substring_splicesites_knowni(acceptor),amb_length));
13609 Substring_splicesitesA_knowni(acceptor),amb_length));
1442713610 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches_shortend,nmismatches,
14428 /*donor*/NULL,acceptor,Doublelist_max(probs_list),Substring_chimera_prob(acceptor),/*distance*/0U,
13611 /*donor*/NULL,acceptor,Doublelist_max(probs_list),Substring_siteA_prob(acceptor),/*distance*/0U,
1442913612 /*shortdistancep*/false,/*penalty*/0,querylength,
1443013613 ambcoords,/*ambcoords_acceptor*/NULL,
1443113614 /*amb_knowni_donor*/splicesites_i,/*amb_knowni_acceptor*/NULL,
1444313626 bestj = Intlist_head(splicesites_i);
1444413627 bestleft = splicesites[bestj] - endlength;
1444513628 if ((donor = Substring_new_donor(/*donor_coord*/splicesites[bestj],/*donor_knowni*/bestj,
14446 Substring_chimera_pos(acceptor),/*substring_querystart*/0,/*substring_queryend*/querylength,
13629 Substring_siteA_pos(acceptor),/*substring_querystart*/0,/*substring_queryend*/querylength,
1444713630 nmismatches_shortend,/*prob*/2.0,/*left*/bestleft,query_compress_fwd,
1444813631 querylength,/*plusp*/true,genestrand,/*sensedir*/SENSE_FORWARD,
1444913632 Substring_chrnum(acceptor),Substring_chroffset(acceptor),
1445013633 Substring_chrhigh(acceptor),Substring_chrlength(acceptor))) != NULL) {
1445113634 debug4h(printf("End 2: short-overlap acceptor_plus: Successful splice from acceptor #%d to donor #%d\n",
14452 Substring_splicesites_knowni(acceptor),Substring_splicesites_knowni(donor)));
13635 Substring_splicesitesA_knowni(acceptor),Substring_splicesitesD_knowni(donor)));
1445313636 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches_shortend,nmismatches,
14454 donor,acceptor,/*donor_prob*/2.0,Substring_chimera_prob(acceptor),/*distance*/origleft-bestleft,
13637 donor,acceptor,/*donor_prob*/2.0,Substring_siteA_prob(acceptor),/*distance*/origleft-bestleft,
1445513638 /*shortdistancep*/true,localsplicing_penalty,querylength,
1445613639 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1445713640 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1447113654 /* End 3 */
1447213655 for (p = donors_minus[nmismatches]; p != NULL; p = p->rest) {
1447313656 donor = (Substring_T) p->first;
14474 support = Substring_chimera_pos(donor);
13657 support = Substring_siteD_pos(donor);
1447513658 endlength = querylength - support;
1447613659 chroffset = Substring_chroffset(donor);
1447713660
1448613669 debug4h(printf("End 3: short-overlap donor_minus: #%d:%u (%d mismatches) => searching left\n",
1448713670 Substring_chrnum(donor),(Chrpos_T) (rightbound+1-chroffset),Substring_nmismatches_whole(donor)));
1448813671
14489 if ((i = Substring_splicesites_knowni(donor)) >= 0) {
13672 if ((i = Substring_splicesitesD_knowni(donor)) >= 0) {
1449013673 origleft = Substring_genomicend(donor);
1449113674 if ((splicesites_i =
1449213675 Splicetrie_find_left(&nmismatches_shortend,&nmismatches_list,i,
1449913682 ambcoords = lookup_splicesites(&probs_list,splicesites_i,splicesites);
1450013683 debug4h(amb_length = endlength /*- nmismatches_shortend*/);
1450113684 debug4h(printf("End 3: short-overlap donor_minus: Successful ambiguous from donor #%d with amb_length %d\n",
14502 Substring_splicesites_knowni(donor),amb_length));
13685 Substring_splicesitesD_knowni(donor),amb_length));
1450313686 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches,nmismatches_shortend,
14504 donor,/*acceptor*/NULL,Substring_chimera_prob(donor),Doublelist_max(probs_list),/*distance*/0U,
13687 donor,/*acceptor*/NULL,Substring_siteD_prob(donor),Doublelist_max(probs_list),/*distance*/0U,
1450513688 /*shortdistancep*/false,/*penalty*/0,querylength,
1450613689 /*ambcoords_donor*/NULL,ambcoords,
1450713690 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/splicesites_i,
1451913702 bestj = Intlist_head(splicesites_i);
1452013703 bestleft = splicesites[bestj] - endlength;
1452113704 if ((acceptor = Substring_new_acceptor(/*acceptor_coord*/splicesites[bestj],/*acceptor_knowni*/bestj,
14522 querylength-Substring_chimera_pos(donor),
13705 querylength-Substring_siteD_pos(donor),
1452313706 /*substring_querystart*/0,/*substring_queryend*/querylength,
1452413707 nmismatches_shortend,/*prob*/2.0,/*left*/bestleft,query_compress_rev,
1452513708 querylength,/*plusp*/false,genestrand,/*sensedir*/SENSE_FORWARD,
1452613709 Substring_chrnum(donor),Substring_chroffset(donor),
1452713710 Substring_chrhigh(donor),Substring_chrlength(donor))) != NULL) {
1452813711 debug4h(printf("End 3: short-overlap donor_minus: Successful splice from donor #%d to acceptor #%d\n",
14529 Substring_splicesites_knowni(donor),Substring_splicesites_knowni(acceptor)));
13712 Substring_splicesitesD_knowni(donor),Substring_splicesitesA_knowni(acceptor)));
1453013713 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches,nmismatches_shortend,
14531 donor,acceptor,Substring_chimera_prob(donor),/*acceptor_prob*/2.0,/*distance*/origleft-bestleft,
13714 donor,acceptor,Substring_siteD_prob(donor),/*acceptor_prob*/2.0,/*distance*/origleft-bestleft,
1453213715 /*shortdistancep*/true,localsplicing_penalty,querylength,
1453313716 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1453413717 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1454813731 /* End 4 */
1454913732 for (p = acceptors_minus[nmismatches]; p != NULL; p = p->rest) {
1455013733 acceptor = (Substring_T) p->first;
14551 endlength = Substring_chimera_pos(acceptor);
13734 endlength = Substring_siteA_pos(acceptor);
1455213735 support = querylength - endlength;
1455313736 chrhigh = Substring_chrhigh(acceptor);
1455413737
1456413747 debug4h(printf("End 4: short-overlap acceptor_minus: #%d:%u (%d mismatches) => searching right\n",
1456513748 Substring_chrnum(acceptor),(Chrpos_T) (leftbound-1-chroffset),Substring_nmismatches_whole(acceptor)));
1456613749
14567 if ((i = Substring_splicesites_knowni(acceptor)) >= 0) {
13750 if ((i = Substring_splicesitesA_knowni(acceptor)) >= 0) {
1456813751 origleft = Substring_genomicend(acceptor);
1456913752 if ((splicesites_i =
1457013753 Splicetrie_find_right(&nmismatches_shortend,&nmismatches_list,i,
1457713760 ambcoords = lookup_splicesites(&probs_list,splicesites_i,splicesites);
1457813761 debug4h(amb_length = endlength /*- nmismatches_shortend*/);
1457913762 debug4h(printf("End 4: short-overlap acceptor_minus: Successful ambiguous from acceptor #%d with amb_length %d\n",
14580 Substring_splicesites_knowni(acceptor),amb_length));
13763 Substring_splicesitesA_knowni(acceptor),amb_length));
1458113764 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches_shortend,nmismatches,
14582 /*donor*/NULL,acceptor,Doublelist_max(probs_list),Substring_chimera_prob(acceptor),/*distance*/0U,
13765 /*donor*/NULL,acceptor,Doublelist_max(probs_list),Substring_siteA_prob(acceptor),/*distance*/0U,
1458313766 /*shortdistancep*/false,/*penalty*/0,querylength,
1458413767 ambcoords,/*ambcoords_acceptor*/NULL,
1458513768 /*amb_knowni_donor*/splicesites_i,/*amb_knowni_acceptor*/NULL,
1459713780 bestj = Intlist_head(splicesites_i);
1459813781 bestleft = splicesites[bestj] - support;
1459913782 if ((donor = Substring_new_donor(/*donor_coord*/splicesites[bestj],/*donor_knowni*/bestj,
14600 querylength-Substring_chimera_pos(acceptor),
13783 querylength-Substring_siteA_pos(acceptor),
1460113784 /*substring_querystart*/0,/*substring_queryend*/querylength,
1460213785 nmismatches_shortend,/*prob*/2.0,/*left*/bestleft,query_compress_rev,
1460313786 querylength,/*plusp*/false,genestrand,/*sensedir*/SENSE_FORWARD,
1460413787 Substring_chrnum(acceptor),Substring_chroffset(acceptor),
1460513788 Substring_chrhigh(acceptor),Substring_chrlength(acceptor))) != NULL) {
1460613789 debug4h(printf("End 4: short-overlap acceptor_minus: Successful splice from acceptor #%d to #%d\n",
14607 Substring_splicesites_knowni(acceptor),Substring_splicesites_knowni(donor)));
13790 Substring_splicesitesA_knowni(acceptor),Substring_splicesitesD_knowni(donor)));
1460813791 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches_shortend,nmismatches,
14609 donor,acceptor,/*donor_prob*/2.0,Substring_chimera_prob(acceptor),/*distance*/bestleft-origleft,
13792 donor,acceptor,/*donor_prob*/2.0,Substring_siteA_prob(acceptor),/*distance*/bestleft-origleft,
1461013793 /*shortdistancep*/true,localsplicing_penalty,querylength,
1461113794 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1461213795 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1462613809 /* End 5 */
1462713810 for (p = antidonors_plus[nmismatches]; p != NULL; p = p->rest) {
1462813811 donor = (Substring_T) p->first;
14629 endlength = Substring_chimera_pos(donor);
13812 endlength = Substring_siteD_pos(donor);
1463013813 support = querylength - endlength;
1463113814 chroffset = Substring_chroffset(donor);
1463213815
1464113824 debug4h(printf("End 5: short-overlap antidonor_plus: #%d:%u (%d mismatches) => searching left\n",
1464213825 Substring_chrnum(donor),(Chrpos_T) (rightbound+1-chroffset),Substring_nmismatches_whole(donor)));
1464313826
14644 if ((i = Substring_splicesites_knowni(donor)) >= 0) {
13827 if ((i = Substring_splicesitesD_knowni(donor)) >= 0) {
1464513828 origleft = Substring_genomicstart(donor);
1464613829 if ((splicesites_i =
1464713830 Splicetrie_find_left(&nmismatches_shortend,&nmismatches_list,i,
1465413837 ambcoords = lookup_splicesites(&probs_list,splicesites_i,splicesites);
1465513838 debug4h(amb_length = endlength /*- nmismatches_shortend*/);
1465613839 debug4h(printf("End 5: short-overlap antidonor_plus: Successful ambiguous from antidonor #%d with amb_length %d\n",
14657 Substring_splicesites_knowni(donor),amb_length));
13840 Substring_splicesitesD_knowni(donor),amb_length));
1465813841 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches,nmismatches_shortend,
14659 donor,/*acceptor*/NULL,Substring_chimera_prob(donor),Doublelist_max(probs_list),/*distance*/0U,
13842 donor,/*acceptor*/NULL,Substring_siteD_prob(donor),Doublelist_max(probs_list),/*distance*/0U,
1466013843 /*shortdistancep*/false,/*penalty*/0,querylength,
1466113844 /*ambcoords_donor*/NULL,ambcoords,
1466213845 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/splicesites_i,
1467413857 bestj = Intlist_head(splicesites_i);
1467513858 bestleft = splicesites[bestj] - endlength;
1467613859 if ((acceptor = Substring_new_acceptor(/*acceptor_coord*/splicesites[bestj],/*acceptor_knowni*/bestj,
14677 Substring_chimera_pos(donor),
13860 Substring_siteD_pos(donor),
1467813861 /*substring_querystart*/0,/*substring_queryend*/querylength,
1467913862 nmismatches_shortend,/*prob*/2.0,/*left*/bestleft,query_compress_fwd,
1468013863 querylength,/*plusp*/true,genestrand,/*sensedir*/SENSE_ANTI,
1468113864 Substring_chrnum(donor),Substring_chroffset(donor),
1468213865 Substring_chrhigh(donor),Substring_chrlength(donor))) != NULL) {
1468313866 debug4h(printf("End 5: short-overlap antidonor_plus: Successful splice from antidonor #%d to antiacceptor #%d\n",
14684 Substring_splicesites_knowni(donor),Substring_splicesites_knowni(acceptor)));
13867 Substring_splicesitesD_knowni(donor),Substring_splicesitesA_knowni(acceptor)));
1468513868 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches,nmismatches_shortend,
14686 donor,acceptor,Substring_chimera_prob(donor),/*acceptor_prob*/2.0,/*distance*/origleft-bestleft,
13869 donor,acceptor,Substring_siteD_prob(donor),/*acceptor_prob*/2.0,/*distance*/origleft-bestleft,
1468713870 /*shortdistancep*/true,localsplicing_penalty,querylength,
1468813871 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1468913872 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1470313886 /* End 6 */
1470413887 for (p = antiacceptors_plus[nmismatches]; p != NULL; p = p->rest) {
1470513888 acceptor = (Substring_T) p->first;
14706 support = Substring_chimera_pos(acceptor);
13889 support = Substring_siteA_pos(acceptor);
1470713890 endlength = querylength - support;
1470813891 chrhigh = Substring_chrhigh(acceptor);
1470913892
1471913902 debug4h(printf("End 6: short-overlap antiacceptor_plus: #%d:%u (%d mismatches) => searching right\n",
1472013903 Substring_chrnum(acceptor),(Chrpos_T) (leftbound-1-chroffset),Substring_nmismatches_whole(acceptor)));
1472113904
14722 if ((i = Substring_splicesites_knowni(acceptor)) >= 0) {
13905 if ((i = Substring_splicesitesA_knowni(acceptor)) >= 0) {
1472313906 origleft = Substring_genomicstart(acceptor);
1472413907 if ((splicesites_i =
1472513908 Splicetrie_find_right(&nmismatches_shortend,&nmismatches_list,i,
1473213915 ambcoords = lookup_splicesites(&probs_list,splicesites_i,splicesites);
1473313916 debug4h(amb_length = endlength /*- nmismatches_shortend*/);
1473413917 debug4h(printf("End 6: short-overlap antiacceptor_plus: Successful ambiguous from antiacceptor #%d with amb_length %d\n",
14735 Substring_splicesites_knowni(acceptor),amb_length));
13918 Substring_splicesitesA_knowni(acceptor),amb_length));
1473613919 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches_shortend,nmismatches,
14737 /*donor*/NULL,acceptor,Doublelist_max(probs_list),Substring_chimera_prob(acceptor),/*distance*/0U,
13920 /*donor*/NULL,acceptor,Doublelist_max(probs_list),Substring_siteA_prob(acceptor),/*distance*/0U,
1473813921 /*shortdistancep*/false,/*penalty*/0,querylength,
1473913922 ambcoords,/*ambcoords_acceptor*/NULL,
1474013923 /*amb_knowni_donor*/splicesites_i,/*amb_knowni_acceptor*/NULL,
1475213935 bestj = Intlist_head(splicesites_i);
1475313936 bestleft = splicesites[bestj] - support;
1475413937 if ((donor = Substring_new_donor(/*donor_coord*/splicesites[bestj],/*donor_knowni*/bestj,
14755 Substring_chimera_pos(acceptor),
13938 Substring_siteA_pos(acceptor),
1475613939 /*substring_querystart*/0,/*substring_queryend*/querylength,
1475713940 nmismatches_shortend,/*prob*/2.0,/*left*/bestleft,query_compress_fwd,
1475813941 querylength,/*plusp*/true,genestrand,/*sensedir*/SENSE_ANTI,
1475913942 Substring_chrnum(acceptor),Substring_chroffset(acceptor),
1476013943 Substring_chrhigh(acceptor),Substring_chrlength(acceptor))) != NULL) {
1476113944 debug4h(printf("End 6: short-overlap antiacceptor_plus: Successful splice from antiacceptor #%d to antidonor #%d\n",
14762 Substring_splicesites_knowni(acceptor),Substring_splicesites_knowni(donor)));
13945 Substring_splicesitesA_knowni(acceptor),Substring_splicesitesD_knowni(donor)));
1476313946 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches_shortend,nmismatches,
14764 donor,acceptor,/*donor_prob*/2.0,Substring_chimera_prob(acceptor),/*distance*/bestleft-origleft,
13947 donor,acceptor,/*donor_prob*/2.0,Substring_siteA_prob(acceptor),/*distance*/bestleft-origleft,
1476513948 /*shortdistancep*/true,localsplicing_penalty,querylength,
1476613949 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1476713950 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1478113964 /* End 7 */
1478213965 for (p = antidonors_minus[nmismatches]; p != NULL; p = p->rest) {
1478313966 donor = (Substring_T) p->first;
14784 endlength = Substring_chimera_pos(donor);
13967 endlength = Substring_siteD_pos(donor);
1478513968 support = querylength - endlength;
1478613969 chrhigh = Substring_chrhigh(donor);
1478713970
1479713980 debug4h(printf("End 7: short-overlap antidonor_minus: #%d:%u (%d mismatches) => searching right\n",
1479813981 Substring_chrnum(donor),(Chrpos_T) (leftbound-1-chroffset),Substring_nmismatches_whole(donor)));
1479913982
14800 if ((i = Substring_splicesites_knowni(donor)) >= 0) {
13983 if ((i = Substring_splicesitesD_knowni(donor)) >= 0) {
1480113984 origleft = Substring_genomicend(donor);
1480213985 if ((splicesites_i =
1480313986 Splicetrie_find_right(&nmismatches_shortend,&nmismatches_list,i,
1481013993 ambcoords = lookup_splicesites(&probs_list,splicesites_i,splicesites);
1481113994 debug4h(amb_length = endlength /*- nmismatches_shortend*/);
1481213995 debug4h(printf("End 7: short-overlap antidonor_minus: Successful ambiguous from antidonor #%d with amb_length %d\n",
14813 Substring_splicesites_knowni(donor),amb_length));
13996 Substring_splicesitesD_knowni(donor),amb_length));
1481413997 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches,nmismatches_shortend,
14815 donor,/*acceptor*/NULL,Substring_chimera_prob(donor),Doublelist_max(probs_list),/*distance*/0U,
13998 donor,/*acceptor*/NULL,Substring_siteD_prob(donor),Doublelist_max(probs_list),/*distance*/0U,
1481613999 /*shortdistancep*/false,/*penalty*/0,querylength,
1481714000 /*ambcoords_donor*/NULL,ambcoords,
1481814001 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/splicesites_i,
1483014013 bestj = Intlist_head(splicesites_i);
1483114014 bestleft = splicesites[bestj] - support;
1483214015 if ((acceptor = Substring_new_acceptor(/*acceptor_coord*/splicesites[bestj],/*acceptor_knowni*/bestj,
14833 querylength-Substring_chimera_pos(donor),
14016 querylength-Substring_siteD_pos(donor),
1483414017 /*substring_querystart*/0,/*substring_queryend*/querylength,
1483514018 nmismatches_shortend,/*prob*/2.0,/*left*/bestleft,query_compress_rev,
1483614019 querylength,/*plusp*/false,genestrand,/*sensedir*/SENSE_ANTI,
1483714020 Substring_chrnum(donor),Substring_chroffset(donor),
1483814021 Substring_chrhigh(donor),Substring_chrlength(donor))) != NULL) {
1483914022 debug4h(printf("End 7: short-overlap antidonor_minus: Successful splice from antidonor #%d to antiacceptor #%d\n",
14840 Substring_splicesites_knowni(donor),Substring_splicesites_knowni(acceptor)));
14023 Substring_splicesitesD_knowni(donor),Substring_splicesitesA_knowni(acceptor)));
1484114024 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches,nmismatches_shortend,
14842 donor,acceptor,Substring_chimera_prob(donor),/*acceptor_prob*/2.0,/*distance*/bestleft-origleft,
14025 donor,acceptor,Substring_siteD_prob(donor),/*acceptor_prob*/2.0,/*distance*/bestleft-origleft,
1484314026 /*shortdistancep*/true,localsplicing_penalty,querylength,
1484414027 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1484514028 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1485914042 /* End 8 */
1486014043 for (p = antiacceptors_minus[nmismatches]; p != NULL; p = p->rest) {
1486114044 acceptor = (Substring_T) p->first;
14862 support = Substring_chimera_pos(acceptor);
14045 support = Substring_siteA_pos(acceptor);
1486314046 endlength = querylength - support;
1486414047 chroffset = Substring_chroffset(acceptor);
1486514048
1487414057 debug4h(printf("End 8: short-overlap antiacceptor_minus: #%d:%u (%d mismatches) => searching left\n",
1487514058 Substring_chrnum(acceptor),(Chrpos_T) (rightbound+1-chroffset),Substring_nmismatches_whole(acceptor)));
1487614059
14877 if ((i = Substring_splicesites_knowni(acceptor)) >= 0) {
14060 if ((i = Substring_splicesitesA_knowni(acceptor)) >= 0) {
1487814061 origleft = Substring_genomicend(acceptor);
1487914062 if ((splicesites_i =
1488014063 Splicetrie_find_left(&nmismatches_shortend,&nmismatches_list,i,
1488714070 ambcoords = lookup_splicesites(&probs_list,splicesites_i,splicesites);
1488814071 debug4h(amb_length = endlength /*- nmismatches_shortend*/);
1488914072 debug4h(printf("End 8: short-overlap antiacceptor_minus: Successful ambiguous from antiacceptor #%d with amb_length %d\n",
14890 Substring_splicesites_knowni(acceptor),amb_length));
14073 Substring_splicesitesA_knowni(acceptor),amb_length));
1489114074 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches_shortend,nmismatches,
14892 /*donor*/NULL,acceptor,Doublelist_max(probs_list),Substring_chimera_prob(acceptor),/*distance*/0U,
14075 /*donor*/NULL,acceptor,Doublelist_max(probs_list),Substring_siteA_prob(acceptor),/*distance*/0U,
1489314076 /*shortdistancep*/false,/*penalty*/0,querylength,
1489414077 ambcoords,/*ambcoords_acceptor*/NULL,
1489514078 /*amb_knowni_donor*/splicesites_i,/*amb_knowni_acceptor*/NULL,
1490714090 bestj = Intlist_head(splicesites_i);
1490814091 bestleft = splicesites[bestj] - endlength;
1490914092 if ((donor = Substring_new_donor(/*donor_coord*/splicesites[bestj],/*donor_knowni*/bestj,
14910 querylength-Substring_chimera_pos(acceptor),
14093 querylength-Substring_siteA_pos(acceptor),
1491114094 /*substring_querystart*/0,/*substring_queryend*/querylength,
1491214095 nmismatches_shortend,/*prob*/2.0,/*left*/bestleft,query_compress_rev,
1491314096 querylength,/*plusp*/false,genestrand,/*sensedir*/SENSE_ANTI,
1491414097 Substring_chrnum(acceptor),Substring_chroffset(acceptor),
1491514098 Substring_chrhigh(acceptor),Substring_chrlength(acceptor))) != NULL) {
1491614099 debug4h(printf("End 8: short-overlap antiacceptor_minus: Successful splice from antiacceptor #%d to antidonor #%d\n",
14917 Substring_splicesites_knowni(acceptor),Substring_splicesites_knowni(donor)));
14100 Substring_splicesitesA_knowni(acceptor),Substring_splicesitesD_knowni(donor)));
1491814101 hits = List_push(hits,(void *) Stage3end_new_splice(&(*found_score),nmismatches_shortend,nmismatches,
14919 donor,acceptor,/*donor_prob*/2.0,Substring_chimera_prob(acceptor),/*distance*/origleft-bestleft,
14102 donor,acceptor,/*donor_prob*/2.0,Substring_siteA_prob(acceptor),/*distance*/origleft-bestleft,
1492014103 /*shortdistancep*/true,localsplicing_penalty,querylength,
1492114104 /*ambcoords_donor*/NULL,/*ambcoords_acceptor*/NULL,
1492214105 /*amb_knowni_donor*/NULL,/*amb_knowni_acceptor*/NULL,
1504814231 } else if (*any_omitted_p) {
1504914232 floors = Floors_new_omitted(querylength,max_end_insertions,this->omitted);
1505014233 *alloc_floors_p = true;
15051 } else if (querylength > MAX_READLENGTH) {
14234 } else if (querylength > max_floors_readlength) {
1505214235 floors = Floors_new_standard(querylength,max_end_insertions,/*keep_floors_p*/false);
1505314236 *alloc_floors_p = true;
1505414237 } else if (keep_floors_p == false) {
1831017493 /* Search 3: Subs/indels via complete set */
1831117494
1831217495 /* 4, 5. Complete set mismatches and indels, omitting frequent oligos */
18313 completesetp = false;
18314 for (q = subs; q != NULL; q = List_next(q)) {
18315 hit = (Stage3end_T) List_head(q);
18316 debug(printf("Hit has total score of %d\n",Stage3end_score(hit)));
18317 if (Stage3end_score(hit) > done_level) {
18318 completesetp = true;
17496 if (subs == NULL) {
17497 completesetp = true;
17498 } else {
17499 completesetp = false;
17500 for (q = subs; q != NULL; q = List_next(q)) {
17501 hit = (Stage3end_T) List_head(q);
17502 debug(printf("Hit has total score of %d\n",Stage3end_score(hit)));
17503 if (Stage3end_score(hit) > done_level) {
17504 completesetp = true;
17505 }
1831917506 }
1832017507 }
1832117508 debug(printf("completesetp %d\n",completesetp));
1846217649 }
1846317650 #endif
1846417651
18465 if (knownsplicingp == true && done_level >= localsplicing_penalty) {
17652 if (knownsplicingp == true && done_level >= localsplicing_penalty &&
17653 (max_splice_mismatches = done_level - localsplicing_penalty) >= 0) {
1846617654 /* Want >= and not > to give better results. Negligible effect on speed. */
1846717655 /* 8. Shortend splicing */
18468
18469 max_splice_mismatches = done_level - localsplicing_penalty;
1847017656 debug(printf("*** Stage 8. Short-end splicing, allowing %d mismatches ***\n",max_splice_mismatches));
1847117657
1847217658 donors_plus = (List_T *) CALLOCA(max_splice_mismatches+1,sizeof(List_T));
1857417760 debug(printf("Skipping distant splicing because done_level %d < distantsplicing_penalty %d and min_trim %d < %d\n",
1857517761 done_level,distantsplicing_penalty,min_trim,min_distantsplicing_end_matches));
1857617762
18577 } else if (find_dna_chimeras_p == true) {
17763 } else if (find_dna_chimeras_p == true &&
17764 (max_splice_mismatches = done_level - distantsplicing_penalty) >= 0) {
1857817765 /* 9 (DNA). Find distant splicing for DNA */
18579 max_splice_mismatches = done_level - distantsplicing_penalty;
1858017766 debug(printf("*** Stage 9 (DNA). Distant splice ends, allowing %d mismatches ***\n",max_splice_mismatches));
1858117767
1858217768 startfrags_plus = (List_T *) CALLOCA(max_splice_mismatches+1,sizeof(List_T));
1861717803 debug(printf("*** Stage 9 (DNA). Distant splicing, allowing %d mismatches ***\n",nmismatches));
1861817804
1861917805 debug4e(printf("Sorting splice ends\n"));
18620 startfrags_plus[nmismatches] = Substring_sort_chimera_halves(startfrags_plus[nmismatches],/*ascendingp*/true);
18621 endfrags_plus[nmismatches] = Substring_sort_chimera_halves(endfrags_plus[nmismatches],/*ascendingp*/true);
18622
18623 startfrags_minus[nmismatches] = Substring_sort_chimera_halves(startfrags_minus[nmismatches],/*ascendingp*/false);
18624 endfrags_minus[nmismatches] = Substring_sort_chimera_halves(endfrags_minus[nmismatches],/*ascendingp*/false);
17806 startfrags_plus[nmismatches] = Substring_sort_siteN_halves(startfrags_plus[nmismatches],/*ascendingp*/true);
17807 endfrags_plus[nmismatches] = Substring_sort_siteN_halves(endfrags_plus[nmismatches],/*ascendingp*/true);
17808
17809 startfrags_minus[nmismatches] = Substring_sort_siteN_halves(startfrags_minus[nmismatches],/*ascendingp*/false);
17810 endfrags_minus[nmismatches] = Substring_sort_siteN_halves(endfrags_minus[nmismatches],/*ascendingp*/false);
1862517811
1862617812 debug4e(printf("Splice ends at %d nmismatches: +startfrags/endfrags %d/%d, -startfrags/endfrags %d/%d\n",
1862717813 nmismatches,
1868217868 FREEA(startfrags_minus);
1868317869 FREEA(endfrags_minus);
1868417870
18685 } else if (knownsplicingp || novelsplicingp) {
17871 } else if ((knownsplicingp || novelsplicingp) &&
17872 (max_splice_mismatches = done_level - distantsplicing_penalty) >= 0) {
1868617873 /* 9 (RNA). Find distant splicing for RNA iteratively using both known and novel splice sites */
18687 max_splice_mismatches = done_level - distantsplicing_penalty;
1868817874 debug(printf("*** Stage 9 (RNA). Distant splice ends, allowing %d mismatches ***\n",max_splice_mismatches));
1868917875
1869017876 donors_plus = (List_T *) CALLOCA(max_splice_mismatches+1,sizeof(List_T));
1873317919 debug(printf("*** Stage 9 (RNA). Distant splicing, allowing %d mismatches ***\n",nmismatches));
1873417920
1873517921 debug4e(printf("Sorting splice ends\n"));
18736 donors_plus[nmismatches] = Substring_sort_chimera_halves(donors_plus[nmismatches],/*ascendingp*/true);
18737 acceptors_plus[nmismatches] = Substring_sort_chimera_halves(acceptors_plus[nmismatches],/*ascendingp*/true);
18738
18739 antidonors_plus[nmismatches] = Substring_sort_chimera_halves(antidonors_plus[nmismatches],/*ascendingp*/false);
18740 antiacceptors_plus[nmismatches] = Substring_sort_chimera_halves(antiacceptors_plus[nmismatches],/*ascendingp*/false);
18741
18742 donors_minus[nmismatches] = Substring_sort_chimera_halves(donors_minus[nmismatches],/*ascendingp*/false);
18743 acceptors_minus[nmismatches] = Substring_sort_chimera_halves(acceptors_minus[nmismatches],/*ascendingp*/false);
18744
18745 antidonors_minus[nmismatches] = Substring_sort_chimera_halves(antidonors_minus[nmismatches],/*ascendingp*/true);
18746 antiacceptors_minus[nmismatches] = Substring_sort_chimera_halves(antiacceptors_minus[nmismatches],/*ascendingp*/true);
17922 donors_plus[nmismatches] = Substring_sort_siteD_halves(donors_plus[nmismatches],/*ascendingp*/true);
17923 acceptors_plus[nmismatches] = Substring_sort_siteA_halves(acceptors_plus[nmismatches],/*ascendingp*/true);
17924
17925 antidonors_plus[nmismatches] = Substring_sort_siteD_halves(antidonors_plus[nmismatches],/*ascendingp*/false);
17926 antiacceptors_plus[nmismatches] = Substring_sort_siteA_halves(antiacceptors_plus[nmismatches],/*ascendingp*/false);
17927
17928 donors_minus[nmismatches] = Substring_sort_siteD_halves(donors_minus[nmismatches],/*ascendingp*/false);
17929 acceptors_minus[nmismatches] = Substring_sort_siteA_halves(acceptors_minus[nmismatches],/*ascendingp*/false);
17930
17931 antidonors_minus[nmismatches] = Substring_sort_siteD_halves(antidonors_minus[nmismatches],/*ascendingp*/true);
17932 antiacceptors_minus[nmismatches] = Substring_sort_siteA_halves(antiacceptors_minus[nmismatches],/*ascendingp*/true);
1874717933
1874817934 debug4e(printf("Splice ends at %d nmismatches: +donors/acceptors %d/%d, +antidonors/antiacceptors %d/%d, -donors/acceptors %d/%d, -antidonors/antiacceptors %d/%d\n",
1874917935 nmismatches,
1899018176 int querylength, query_lastpos, cutoff_level;
1899118177 char *queryuc_ptr, *quality_string;
1899218178 Compress_T query_compress_fwd = NULL, query_compress_rev = NULL;
18179 char *queryrc;
18180
18181 querylength = Shortread_fulllength(queryseq);
1899318182
1899418183 #ifdef HAVE_ALLOCA
18995 char *queryrc;
18996 #else
18997 char queryrc[MAX_READLENGTH+1];
18998 #endif
18999
19000 querylength = Shortread_fulllength(queryseq);
19001
19002 #ifndef HAVE_ALLOCA
19003 if (querylength > MAX_READLENGTH) {
19004 fprintf(stderr,"Read %s has length %d > MAX_READLENGTH %d. Either run configure and make again with a higher value of MAX_READLENGTH, or consider using GMAP instead.\n",
19005 Shortread_accession(queryseq),querylength,MAX_READLENGTH);
19006 *npaths_primary = *npaths_altloc = 0;
19007 return (Stage3end_T *) NULL;
19008 }
18184 if (querylength <= MAX_STACK_READLENGTH) {
18185 queryrc = (char *) ALLOCA((querylength+1)*sizeof(int));
18186 } else {
18187 queryrc = (char *) MALLOC((querylength+1)*sizeof(int));
18188 }
18189 #else
18190 queryrc = (char *) MALLOC((querylength+1)*sizeof(int));
1900918191 #endif
1901018192
1901118193 if (user_maxlevel_float < 0.0) {
1903318215
1903418216 query_compress_fwd = Compress_new_fwd(queryuc_ptr,querylength);
1903518217 query_compress_rev = Compress_new_rev(queryuc_ptr,querylength);
19036 #ifdef HAVE_ALLOCA
19037 queryrc = (char *) ALLOCA((querylength+1)*sizeof(int));
19038 #endif
1903918218 make_complement_buffered(queryrc,queryuc_ptr,querylength);
1904018219
1904118220 this = Stage1_new(querylength);
1906918248 Compress_free(&query_compress_fwd);
1907018249 Compress_free(&query_compress_rev);
1907118250 Stage1_free(&this,querylength);
18251
18252 #ifdef HAVE_ALLOCA
18253 if (querylength <= MAX_STACK_READLENGTH) {
18254 FREEA(queryrc);
18255 } else {
18256 FREE(queryrc);
18257 }
18258 #else
18259 FREE(queryrc);
18260 #endif
18261
1907218262 return stage3array;
1907318263 }
1907418264
1909418284 char *queryuc_ptr, *quality_string;
1909518285 Compress_T query_compress_fwd = NULL, query_compress_rev = NULL;
1909618286 bool allvalidp;
18287 char *queryrc;
18288
18289 querylength = Shortread_fulllength(queryseq);
1909718290
1909818291 #ifdef HAVE_ALLOCA
19099 char *queryrc;
19100 #else
19101 char queryrc[MAX_READLENGTH+1];
19102 #endif
19103
19104 querylength = Shortread_fulllength(queryseq);
19105
19106 #ifndef HAVE_ALLOCA
19107 if (querylength > MAX_READLENGTH) {
19108 fprintf(stderr,"Read %s has length %d > MAX_READLENGTH %d. Either run configure and make again with a higher value of MAX_READLENGTH, or consider using GMAP instead.\n",
19109 Shortread_accession(queryseq),querylength,MAX_READLENGTH);
19110 *npaths_primary = *npaths_altloc = 0;
19111 return (Stage3end_T *) NULL;
19112 }
18292 if (querylength <= MAX_STACK_READLENGTH) {
18293 queryrc = (char *) ALLOCA((querylength+1)*sizeof(int));
18294 } else {
18295 queryrc = (char *) MALLOC((querylength+1)*sizeof(int));
18296 }
18297 #else
18298 queryrc = (char *) MALLOC((querylength+1)*sizeof(int));
1911318299 #endif
1911418300
1911518301 if (user_maxlevel_float < 0.0) {
1914318329 query_compress_fwd = Compress_new_fwd(queryuc_ptr,querylength);
1914418330 query_compress_rev = Compress_new_rev(queryuc_ptr,querylength);
1914518331 gmap_history = History_new();
19146 #ifdef HAVE_ALLOCA
19147 queryrc = (char *) ALLOCA((querylength+1)*sizeof(char));
19148 #endif
1914918332 make_complement_buffered(queryrc,queryuc_ptr,querylength);
1915018333
1915118334 if (read_oligos(&allvalidp,this_geneplus,queryuc_ptr,querylength,query_lastpos,/*genestrand*/+1) > 0) {
1919818381 Compress_free(&query_compress_rev);
1919918382 Stage1_free(&this_geneminus,querylength);
1920018383 Stage1_free(&this_geneplus,querylength);
18384
18385 #ifdef HAVE_ALLOCA
18386 if (querylength <= MAX_STACK_READLENGTH) {
18387 FREEA(queryrc);
18388 } else {
18389 FREE(queryrc);
18390 }
18391 #else
18392 FREE(queryrc);
18393 #endif
18394
1920118395 return stage3array;
1920218396 }
1920318397
2100320197 }
2100420198 debug(printf("Test for completeset using better_free_end_exists_p: completeset5p %d, completeset3p %d\n",completeset5p,completeset3p));
2100520199 #endif
20200
20201 #if 0
20202 } else {
20203 /* This causes very slow running time */
20204 if (subs5 == NULL) {
20205 completeset5p = true;
20206 }
20207 if (subs3 == NULL) {
20208 completeset3p = true;
20209 }
20210 #endif
2100620211 }
2100720212
2100820213 if (querylength5 < min_kmer_readlength) {
2154020745 nmismatches,max_splice_mismatches_5));
2154120746
2154220747 debug4e(printf("Sorting splice ends\n"));
21543 donors_plus_5[nmismatches] = Substring_sort_chimera_halves(donors_plus_5[nmismatches],/*ascendingp*/true);
21544 acceptors_plus_5[nmismatches] = Substring_sort_chimera_halves(acceptors_plus_5[nmismatches],/*ascendingp*/true);
21545
21546 antidonors_plus_5[nmismatches] = Substring_sort_chimera_halves(antidonors_plus_5[nmismatches],/*ascendingp*/false);
21547 antiacceptors_plus_5[nmismatches] = Substring_sort_chimera_halves(antiacceptors_plus_5[nmismatches],/*ascendingp*/false);
21548
21549 donors_minus_5[nmismatches] = Substring_sort_chimera_halves(donors_minus_5[nmismatches],/*ascendingp*/false);
21550 acceptors_minus_5[nmismatches] = Substring_sort_chimera_halves(acceptors_minus_5[nmismatches],/*ascendingp*/false);
21551
21552 antidonors_minus_5[nmismatches] = Substring_sort_chimera_halves(antidonors_minus_5[nmismatches],/*ascendingp*/true);
21553 antiacceptors_minus_5[nmismatches] = Substring_sort_chimera_halves(antiacceptors_minus_5[nmismatches],/*ascendingp*/true);
20748 donors_plus_5[nmismatches] = Substring_sort_siteD_halves(donors_plus_5[nmismatches],/*ascendingp*/true);
20749 acceptors_plus_5[nmismatches] = Substring_sort_siteA_halves(acceptors_plus_5[nmismatches],/*ascendingp*/true);
20750
20751 antidonors_plus_5[nmismatches] = Substring_sort_siteD_halves(antidonors_plus_5[nmismatches],/*ascendingp*/false);
20752 antiacceptors_plus_5[nmismatches] = Substring_sort_siteA_halves(antiacceptors_plus_5[nmismatches],/*ascendingp*/false);
20753
20754 donors_minus_5[nmismatches] = Substring_sort_siteD_halves(donors_minus_5[nmismatches],/*ascendingp*/false);
20755 acceptors_minus_5[nmismatches] = Substring_sort_siteA_halves(acceptors_minus_5[nmismatches],/*ascendingp*/false);
20756
20757 antidonors_minus_5[nmismatches] = Substring_sort_siteD_halves(antidonors_minus_5[nmismatches],/*ascendingp*/true);
20758 antiacceptors_minus_5[nmismatches] = Substring_sort_siteA_halves(antiacceptors_minus_5[nmismatches],/*ascendingp*/true);
2155420759
2155520760 debug4e(printf("Splice ends at %d nmismatches: +donors/acceptors %d/%d, +antidonors/antiacceptors %d/%d, -donors/acceptors %d/%d, -antidonors/antiacceptors %d/%d\n",
2155620761 nmismatches,
2164120846 nmismatches,max_splice_mismatches_3));
2164220847
2164320848 debug4e(printf("Sorting splice ends\n"));
21644 donors_plus_3[nmismatches] = Substring_sort_chimera_halves(donors_plus_3[nmismatches],/*ascendingp*/true);
21645 acceptors_plus_3[nmismatches] = Substring_sort_chimera_halves(acceptors_plus_3[nmismatches],/*ascendingp*/true);
21646
21647 antidonors_plus_3[nmismatches] = Substring_sort_chimera_halves(antidonors_plus_3[nmismatches],/*ascendingp*/false);
21648 antiacceptors_plus_3[nmismatches] = Substring_sort_chimera_halves(antiacceptors_plus_3[nmismatches],/*ascendingp*/false);
21649
21650 donors_minus_3[nmismatches] = Substring_sort_chimera_halves(donors_minus_3[nmismatches],/*ascendingp*/false);
21651 acceptors_minus_3[nmismatches] = Substring_sort_chimera_halves(acceptors_minus_3[nmismatches],/*ascendingp*/false);
21652
21653 antidonors_minus_3[nmismatches] = Substring_sort_chimera_halves(antidonors_minus_3[nmismatches],/*ascendingp*/true);
21654 antiacceptors_minus_3[nmismatches] = Substring_sort_chimera_halves(antiacceptors_minus_3[nmismatches],/*ascendingp*/true);
20849 donors_plus_3[nmismatches] = Substring_sort_siteD_halves(donors_plus_3[nmismatches],/*ascendingp*/true);
20850 acceptors_plus_3[nmismatches] = Substring_sort_siteA_halves(acceptors_plus_3[nmismatches],/*ascendingp*/true);
20851
20852 antidonors_plus_3[nmismatches] = Substring_sort_siteD_halves(antidonors_plus_3[nmismatches],/*ascendingp*/false);
20853 antiacceptors_plus_3[nmismatches] = Substring_sort_siteA_halves(antiacceptors_plus_3[nmismatches],/*ascendingp*/false);
20854
20855 donors_minus_3[nmismatches] = Substring_sort_siteD_halves(donors_minus_3[nmismatches],/*ascendingp*/false);
20856 acceptors_minus_3[nmismatches] = Substring_sort_siteA_halves(acceptors_minus_3[nmismatches],/*ascendingp*/false);
20857
20858 antidonors_minus_3[nmismatches] = Substring_sort_siteD_halves(antidonors_minus_3[nmismatches],/*ascendingp*/true);
20859 antiacceptors_minus_3[nmismatches] = Substring_sort_siteA_halves(antiacceptors_minus_3[nmismatches],/*ascendingp*/true);
2165520860
2165620861 debug4e(printf("Splice ends at %d nmismatches: +donors/acceptors %d/%d, +antidonors/antiacceptors %d/%d, -donors/acceptors %d/%d, -antidonors/antiacceptors %d/%d\n",
2165720862 nmismatches,
2315222357 int maxpairedpaths = maxpaths_search; /* 100000 */
2315322358 #endif
2315422359 bool abort_pairing_p;
23155
23156 #ifdef HAVE_ALLOCA
2315722360 char *queryrc5, *queryrc3;
23158 #else
23159 char queryrc5[MAX_READLENGTH+1], queryrc3[MAX_READLENGTH+1];
23160 #endif
23161
2316222361
2316322362 querylength5 = Shortread_fulllength(queryseq5);
2316422363 querylength3 = Shortread_fulllength(queryseq3);
2316522364
23166 #ifndef HAVE_ALLOCA
23167 if (querylength5 > MAX_READLENGTH || querylength3 > MAX_READLENGTH) {
23168 fprintf(stderr,"Paired-read %s has lengths %d and %d > MAX_READLENGTH %d. Either run configure and make again with a higher value of MAX_READLENGTH, or consider using GMAP instead.\n",
23169 Shortread_accession(queryseq5),querylength5,querylength3,MAX_READLENGTH);
23170 *npaths_primary = *npaths_altloc = 0;
23171 *nhits5_primary = *nhits5_altloc = 0;
23172 *nhits3_primary = *nhits3_altloc = 0;
23173 *stage3array5 = *stage3array3 = (Stage3end_T *) NULL;
23174 return (Stage3pair_T *) NULL;
23175 }
23176 #else
23177 queryrc5 = (char *) ALLOCA((querylength5+1)*sizeof(char));
23178 queryrc3 = (char *) ALLOCA((querylength3+1)*sizeof(char));
22365 #ifdef HAVE_ALLOCA
22366 if (querylength5 <= MAX_STACK_READLENGTH) {
22367 queryrc5 = (char *) ALLOCA((querylength5+1)*sizeof(char));
22368 } else {
22369 queryrc5 = (char *) MALLOC((querylength5+1)*sizeof(char));
22370 }
22371 if (querylength3 <= MAX_STACK_READLENGTH) {
22372 queryrc3 = (char *) ALLOCA((querylength3+1)*sizeof(char));
22373 } else {
22374 queryrc3 = (char *) MALLOC((querylength3+1)*sizeof(char));
22375 }
22376 #else
22377 queryrc5 = (char *) MALLOC((querylength5+1)*sizeof(char));
22378 queryrc3 = (char *) MALLOC((querylength3+1)*sizeof(char));
2317922379 #endif
2318022380
2318122381 if (user_maxlevel_float < 0.0) {
2327322473 Compress_free(&query3_compress_rev);
2327422474 Stage1_free(&this5,querylength5);
2327522475 Stage1_free(&this3,querylength3);
23276 return (Stage3pair_T *) NULL;
22476
22477 stage3pairarray = (Stage3pair_T *) NULL;
2327722478
2327822479 } else {
2327922480 stage3pairarray =
2329522496 Compress_free(&query3_compress_rev);
2329622497 Stage1_free(&this5,querylength5);
2329722498 Stage1_free(&this3,querylength3);
23298 return stage3pairarray;
23299 }
22499 }
22500
22501 #ifdef HAVE_ALLOCA
22502 if (querylength5 <= MAX_STACK_READLENGTH) {
22503 FREEA(queryrc5);
22504 } else {
22505 FREE(queryrc5);
22506 }
22507 if (querylength3 <= MAX_STACK_READLENGTH) {
22508 FREEA(queryrc3);
22509 } else {
22510 FREE(queryrc3);
22511 }
22512 #else
22513 FREE(queryrc5);
22514 FREE(queryrc3);
22515 #endif
22516
22517 return stage3pairarray;
2330022518 }
2330122519
2330222520
2333422552 int maxpairedpaths = maxpaths_search; /* 100000 */
2333522553 #endif
2333622554 bool abort_pairing_p_geneplus, abort_pairing_p_geneminus;
23337
23338 #ifdef HAVE_ALLOCA
2333922555 char *queryrc5, *queryrc3;
23340 #else
23341 char queryrc5[MAX_READLENGTH+1], queryrc3[MAX_READLENGTH+1];
23342 #endif
2334322556
2334422557
2334522558 querylength5 = Shortread_fulllength(queryseq5);
2334622559 querylength3 = Shortread_fulllength(queryseq3);
2334722560
23348 #ifndef HAVE_ALLOCA
23349 if (querylength5 > MAX_READLENGTH || querylength3 > MAX_READLENGTH) {
23350 fprintf(stderr,"Paired-read %s has lengths %d and %d > MAX_READLENGTH %d. Either run configure and make again with a higher value of MAX_READLENGTH, or consider using GMAP instead.\n",
23351 Shortread_accession(queryseq5),querylength5,querylength3,MAX_READLENGTH);
23352 *npaths_primary = *npaths_altloc = 0;
23353 *nhits5_primary = *nhits5_altloc = 0;
23354 *nhits3_primary = *nhits3_altloc = 0;
23355 *stage3array5 = *stage3array3 = (Stage3end_T *) NULL;
23356 return (Stage3pair_T *) NULL;
23357 }
23358 #else
23359 queryrc5 = (char *) ALLOCA((querylength5+1)*sizeof(char));
23360 queryrc3 = (char *) ALLOCA((querylength3+1)*sizeof(char));
22561 #ifdef HAVE_ALLOCA
22562 if (querylength5 <= MAX_STACK_READLENGTH) {
22563 queryrc5 = (char *) ALLOCA((querylength5+1)*sizeof(char));
22564 } else {
22565 queryrc5 = (char *) MALLOC((querylength5+1)*sizeof(char));
22566 }
22567 if (querylength3 <= MAX_STACK_READLENGTH) {
22568 queryrc3 = (char *) ALLOCA((querylength3+1)*sizeof(char));
22569 } else {
22570 queryrc3 = (char *) MALLOC((querylength3+1)*sizeof(char));
22571 }
22572 #else
22573 queryrc5 = (char *) MALLOC((querylength5+1)*sizeof(char));
22574 queryrc3 = (char *) MALLOC((querylength3+1)*sizeof(char));
2336122575 #endif
2336222576
2336322577 if (user_maxlevel_float < 0.0) {
2345722671 terminals_geneminus,hits_geneminus_5,hits_geneminus_3,querylength5,querylength3);
2345822672
2345922673 if (abort_pairing_p_geneplus == true) {
23460 debug16(printf("abort_pairing_p_geneplus is true\n"));
23461 paired_results_free(this_geneplus_5,this_geneplus_3,hitpairs_geneplus,samechr_geneplus,conc_transloc_geneplus,
23462 terminals_geneplus,hits_geneplus_5,hits_geneplus_3,querylength5,querylength3);
23463
23464 this_geneplus_5 = Stage1_new(querylength5);
23465 this_geneplus_3 = Stage1_new(querylength3);
23466 realign_separately(stage3array5,&(*nhits5_primary),&(*nhits5_altloc),&(*first_absmq5),&(*second_absmq5),
23467 stage3array3,&(*nhits3_primary),&(*nhits3_altloc),&(*first_absmq3),&(*second_absmq3),
23468 this_geneplus_5,this_geneplus_3,
23469 query5_compress_fwd,query5_compress_rev,query3_compress_fwd,query3_compress_rev,
23470 queryseq5,queryuc_ptr_5,queryrc5,quality_string_5,querylength5,query5_lastpos,
23471 queryseq3,queryuc_ptr_3,queryrc3,quality_string_3,querylength3,query3_lastpos,
23472 indexdb_fwd,indexdb_rev,indexdb_size_threshold,floors_array,
23473 user_maxlevel_5,user_maxlevel_3,min_coverage_5,min_coverage_3,
23474 indel_penalty_middle,indel_penalty_end,
23475 allow_end_indels_p,max_end_insertions,max_end_deletions,min_indel_end_matches,
23476 localsplicing_penalty,distantsplicing_penalty,min_shortend,
23477 oligoindices_minor,pairpool,diagpool,cellpool,dynprogL,dynprogM,dynprogR,
23478 keep_floors_p,/*genestrand*/+1);
23479
23480 *npaths_primary = *npaths_altloc = 0;
23481 *final_pairtype = UNPAIRED;
23482 History_free(&gmap_history_3);
23483 History_free(&gmap_history_5);
23484 Compress_free(&query5_compress_fwd);
23485 Compress_free(&query5_compress_rev);
23486 Compress_free(&query3_compress_fwd);
23487 Compress_free(&query3_compress_rev);
23488 Stage1_free(&this_geneplus_5,querylength5);
23489 Stage1_free(&this_geneplus_3,querylength3);
23490 return (Stage3pair_T *) NULL;
23491
23492 } else {
23493 stage3pairarray =
23494 consolidate_paired_results(&(*npaths_primary),&(*npaths_altloc),&(*first_absmq),&(*second_absmq),&(*final_pairtype),
23495 &(*stage3array5),&(*nhits5_primary),&(*nhits5_altloc),&(*first_absmq5),&(*second_absmq5),
23496 &(*stage3array3),&(*nhits3_primary),&(*nhits3_altloc),&(*first_absmq3),&(*second_absmq3),
23497 hitpairs_geneplus,samechr_geneplus,conc_transloc_geneplus,terminals_geneplus,
23498 hits_geneplus_5,hits_geneplus_3,
23499 query5_compress_fwd,query5_compress_rev,query3_compress_fwd,query3_compress_rev,
23500 queryseq5,queryuc_ptr_5,quality_string_5,querylength5,
23501 queryseq3,queryuc_ptr_3,quality_string_3,querylength3,
23502 cutoff_level_5,cutoff_level_3,min_coverage_5,min_coverage_3,
23503 oligoindices_minor,pairpool,diagpool,cellpool,dynprogL,dynprogM,dynprogR);
23504 History_free(&gmap_history_3);
23505 History_free(&gmap_history_5);
23506 Compress_free(&query5_compress_fwd);
23507 Compress_free(&query5_compress_rev);
23508 Compress_free(&query3_compress_fwd);
23509 Compress_free(&query3_compress_rev);
23510 Stage1_free(&this_geneplus_5,querylength5);
23511 Stage1_free(&this_geneplus_3,querylength3);
23512 return stage3pairarray;
23513 }
22674 debug16(printf("abort_pairing_p_geneplus is true\n"));
22675 paired_results_free(this_geneplus_5,this_geneplus_3,hitpairs_geneplus,samechr_geneplus,conc_transloc_geneplus,
22676 terminals_geneplus,hits_geneplus_5,hits_geneplus_3,querylength5,querylength3);
22677
22678 this_geneplus_5 = Stage1_new(querylength5);
22679 this_geneplus_3 = Stage1_new(querylength3);
22680 realign_separately(stage3array5,&(*nhits5_primary),&(*nhits5_altloc),&(*first_absmq5),&(*second_absmq5),
22681 stage3array3,&(*nhits3_primary),&(*nhits3_altloc),&(*first_absmq3),&(*second_absmq3),
22682 this_geneplus_5,this_geneplus_3,
22683 query5_compress_fwd,query5_compress_rev,query3_compress_fwd,query3_compress_rev,
22684 queryseq5,queryuc_ptr_5,queryrc5,quality_string_5,querylength5,query5_lastpos,
22685 queryseq3,queryuc_ptr_3,queryrc3,quality_string_3,querylength3,query3_lastpos,
22686 indexdb_fwd,indexdb_rev,indexdb_size_threshold,floors_array,
22687 user_maxlevel_5,user_maxlevel_3,min_coverage_5,min_coverage_3,
22688 indel_penalty_middle,indel_penalty_end,
22689 allow_end_indels_p,max_end_insertions,max_end_deletions,min_indel_end_matches,
22690 localsplicing_penalty,distantsplicing_penalty,min_shortend,
22691 oligoindices_minor,pairpool,diagpool,cellpool,dynprogL,dynprogM,dynprogR,
22692 keep_floors_p,/*genestrand*/+1);
22693
22694 *npaths_primary = *npaths_altloc = 0;
22695 *final_pairtype = UNPAIRED;
22696 History_free(&gmap_history_3);
22697 History_free(&gmap_history_5);
22698 Compress_free(&query5_compress_fwd);
22699 Compress_free(&query5_compress_rev);
22700 Compress_free(&query3_compress_fwd);
22701 Compress_free(&query3_compress_rev);
22702 Stage1_free(&this_geneplus_5,querylength5);
22703 Stage1_free(&this_geneplus_3,querylength3);
22704
22705 stage3pairarray = (Stage3pair_T *) NULL;
22706
22707 } else {
22708 stage3pairarray =
22709 consolidate_paired_results(&(*npaths_primary),&(*npaths_altloc),&(*first_absmq),&(*second_absmq),&(*final_pairtype),
22710 &(*stage3array5),&(*nhits5_primary),&(*nhits5_altloc),&(*first_absmq5),&(*second_absmq5),
22711 &(*stage3array3),&(*nhits3_primary),&(*nhits3_altloc),&(*first_absmq3),&(*second_absmq3),
22712 hitpairs_geneplus,samechr_geneplus,conc_transloc_geneplus,terminals_geneplus,
22713 hits_geneplus_5,hits_geneplus_3,
22714 query5_compress_fwd,query5_compress_rev,query3_compress_fwd,query3_compress_rev,
22715 queryseq5,queryuc_ptr_5,quality_string_5,querylength5,
22716 queryseq3,queryuc_ptr_3,quality_string_3,querylength3,
22717 cutoff_level_5,cutoff_level_3,min_coverage_5,min_coverage_3,
22718 oligoindices_minor,pairpool,diagpool,cellpool,dynprogL,dynprogM,dynprogR);
22719 History_free(&gmap_history_3);
22720 History_free(&gmap_history_5);
22721 Compress_free(&query5_compress_fwd);
22722 Compress_free(&query5_compress_rev);
22723 Compress_free(&query3_compress_fwd);
22724 Compress_free(&query3_compress_rev);
22725 Stage1_free(&this_geneplus_5,querylength5);
22726 Stage1_free(&this_geneplus_3,querylength3);
22727 /* return stage3pairarray; */
22728 }
2351422729
2351522730 } else if (found_score_geneminus < found_score_geneplus) {
2351622731 paired_results_free(this_geneplus_5,this_geneplus_3,hitpairs_geneplus,samechr_geneplus,conc_transloc_geneplus,
2351722732 terminals_geneplus,hits_geneplus_5,hits_geneplus_3,querylength5,querylength3);
2351822733
2351922734 if (abort_pairing_p_geneminus == true) {
23520 debug16(printf("abort_pairing_p_geneminus is true\n"));
23521 paired_results_free(this_geneminus_5,this_geneminus_3,hitpairs_geneminus,samechr_geneminus,conc_transloc_geneminus,
23522 terminals_geneminus,hits_geneminus_5,hits_geneminus_3,querylength5,querylength3);
23523
23524 this_geneminus_5 = Stage1_new(querylength5);
23525 this_geneminus_3 = Stage1_new(querylength3);
23526 realign_separately(stage3array5,&(*nhits5_primary),&(*nhits5_altloc),&(*first_absmq5),&(*second_absmq5),
23527 stage3array3,&(*nhits3_primary),&(*nhits3_altloc),&(*first_absmq3),&(*second_absmq3),
23528 this_geneminus_5,this_geneminus_3,
23529 query5_compress_fwd,query5_compress_rev,query3_compress_fwd,query3_compress_rev,
23530 queryseq5,queryuc_ptr_5,queryrc5,quality_string_5,querylength5,query5_lastpos,
23531 queryseq3,queryuc_ptr_3,queryrc3,quality_string_3,querylength3,query3_lastpos,
23532 indexdb_fwd,indexdb_rev,indexdb_size_threshold,floors_array,
23533 user_maxlevel_5,user_maxlevel_3,min_coverage_5,min_coverage_3,
23534 indel_penalty_middle,indel_penalty_end,
23535 allow_end_indels_p,max_end_insertions,max_end_deletions,min_indel_end_matches,
23536 localsplicing_penalty,distantsplicing_penalty,min_shortend,
23537 oligoindices_minor,pairpool,diagpool,cellpool,dynprogL,dynprogM,dynprogR,
23538 keep_floors_p,/*genestrand*/+2);
23539
23540 *npaths_primary = *npaths_altloc = 0;
23541 *final_pairtype = UNPAIRED;
23542 History_free(&gmap_history_3);
23543 History_free(&gmap_history_5);
23544 Compress_free(&query5_compress_fwd);
23545 Compress_free(&query5_compress_rev);
23546 Compress_free(&query3_compress_fwd);
23547 Compress_free(&query3_compress_rev);
23548 Stage1_free(&this_geneminus_5,querylength5);
23549 Stage1_free(&this_geneminus_3,querylength3);
23550 return (Stage3pair_T *) NULL;
23551
23552 } else {
23553 stage3pairarray =
23554 consolidate_paired_results(&(*npaths_primary),&(*npaths_altloc),&(*first_absmq),&(*second_absmq),&(*final_pairtype),
23555 &(*stage3array5),&(*nhits5_primary),&(*nhits5_altloc),&(*first_absmq5),&(*second_absmq5),
23556 &(*stage3array3),&(*nhits3_primary),&(*nhits3_altloc),&(*first_absmq3),&(*second_absmq3),
23557 hitpairs_geneminus,samechr_geneminus,conc_transloc_geneminus,terminals_geneminus,
23558 hits_geneminus_5,hits_geneminus_3,
23559 query5_compress_fwd,query5_compress_rev,query3_compress_fwd,query3_compress_rev,
23560 queryseq5,queryuc_ptr_5,quality_string_5,querylength5,
23561 queryseq3,queryuc_ptr_3,quality_string_3,querylength3,
23562 cutoff_level_5,cutoff_level_3,min_coverage_5,min_coverage_3,
23563 oligoindices_minor,pairpool,diagpool,cellpool,dynprogL,dynprogM,dynprogR);
23564 History_free(&gmap_history_3);
23565 History_free(&gmap_history_5);
23566 Compress_free(&query5_compress_fwd);
23567 Compress_free(&query5_compress_rev);
23568 Compress_free(&query3_compress_fwd);
23569 Compress_free(&query3_compress_rev);
23570 Stage1_free(&this_geneminus_5,querylength5);
23571 Stage1_free(&this_geneminus_3,querylength3);
23572 return stage3pairarray;
23573 }
22735 debug16(printf("abort_pairing_p_geneminus is true\n"));
22736 paired_results_free(this_geneminus_5,this_geneminus_3,hitpairs_geneminus,samechr_geneminus,conc_transloc_geneminus,
22737 terminals_geneminus,hits_geneminus_5,hits_geneminus_3,querylength5,querylength3);
22738
22739 this_geneminus_5 = Stage1_new(querylength5);
22740 this_geneminus_3 = Stage1_new(querylength3);
22741 realign_separately(stage3array5,&(*nhits5_primary),&(*nhits5_altloc),&(*first_absmq5),&(*second_absmq5),
22742 stage3array3,&(*nhits3_primary),&(*nhits3_altloc),&(*first_absmq3),&(*second_absmq3),
22743 this_geneminus_5,this_geneminus_3,
22744 query5_compress_fwd,query5_compress_rev,query3_compress_fwd,query3_compress_rev,
22745 queryseq5,queryuc_ptr_5,queryrc5,quality_string_5,querylength5,query5_lastpos,
22746 queryseq3,queryuc_ptr_3,queryrc3,quality_string_3,querylength3,query3_lastpos,
22747 indexdb_fwd,indexdb_rev,indexdb_size_threshold,floors_array,
22748 user_maxlevel_5,user_maxlevel_3,min_coverage_5,min_coverage_3,
22749 indel_penalty_middle,indel_penalty_end,
22750 allow_end_indels_p,max_end_insertions,max_end_deletions,min_indel_end_matches,
22751 localsplicing_penalty,distantsplicing_penalty,min_shortend,
22752 oligoindices_minor,pairpool,diagpool,cellpool,dynprogL,dynprogM,dynprogR,
22753 keep_floors_p,/*genestrand*/+2);
22754
22755 *npaths_primary = *npaths_altloc = 0;
22756 *final_pairtype = UNPAIRED;
22757 History_free(&gmap_history_3);
22758 History_free(&gmap_history_5);
22759 Compress_free(&query5_compress_fwd);
22760 Compress_free(&query5_compress_rev);
22761 Compress_free(&query3_compress_fwd);
22762 Compress_free(&query3_compress_rev);
22763 Stage1_free(&this_geneminus_5,querylength5);
22764 Stage1_free(&this_geneminus_3,querylength3);
22765
22766 stage3pairarray = (Stage3pair_T *) NULL;
22767
22768 } else {
22769 stage3pairarray =
22770 consolidate_paired_results(&(*npaths_primary),&(*npaths_altloc),&(*first_absmq),&(*second_absmq),&(*final_pairtype),
22771 &(*stage3array5),&(*nhits5_primary),&(*nhits5_altloc),&(*first_absmq5),&(*second_absmq5),
22772 &(*stage3array3),&(*nhits3_primary),&(*nhits3_altloc),&(*first_absmq3),&(*second_absmq3),
22773 hitpairs_geneminus,samechr_geneminus,conc_transloc_geneminus,terminals_geneminus,
22774 hits_geneminus_5,hits_geneminus_3,
22775 query5_compress_fwd,query5_compress_rev,query3_compress_fwd,query3_compress_rev,
22776 queryseq5,queryuc_ptr_5,quality_string_5,querylength5,
22777 queryseq3,queryuc_ptr_3,quality_string_3,querylength3,
22778 cutoff_level_5,cutoff_level_3,min_coverage_5,min_coverage_3,
22779 oligoindices_minor,pairpool,diagpool,cellpool,dynprogL,dynprogM,dynprogR);
22780 History_free(&gmap_history_3);
22781 History_free(&gmap_history_5);
22782 Compress_free(&query5_compress_fwd);
22783 Compress_free(&query5_compress_rev);
22784 Compress_free(&query3_compress_fwd);
22785 Compress_free(&query3_compress_rev);
22786 Stage1_free(&this_geneminus_5,querylength5);
22787 Stage1_free(&this_geneminus_3,querylength3);
22788 /* return stage3pairarray; */
22789 }
2357422790
2357522791 } else {
2357622792 hitpairs = List_append(hitpairs_geneplus,hitpairs_geneminus);
2360022816 Stage1_free(&this_geneminus_3,querylength3);
2360122817 Stage1_free(&this_geneplus_5,querylength5);
2360222818 Stage1_free(&this_geneplus_3,querylength3);
23603 return stage3pairarray;
23604 }
22819 /* return stage3pairarray */
22820 }
22821
22822 #ifdef HAVE_ALLOCA
22823 if (querylength5 <= MAX_STACK_READLENGTH) {
22824 FREEA(queryrc5);
22825 } else {
22826 FREE(queryrc5);
22827 }
22828 if (querylength3 <= MAX_STACK_READLENGTH) {
22829 FREEA(queryrc3);
22830 } else {
22831 FREE(queryrc3);
22832 }
22833 #else
22834 FREE(queryrc5);
22835 FREE(queryrc3);
22836 #endif
22837
22838 return stage3pairarray;
2360522839 }
2360622840
2360722841
2367722911 int extramaterial_end_in, int extramaterial_paired_in,
2367822912 int gmap_mode, int trigger_score_for_gmap_in, int gmap_allowance_in,
2367922913 int max_gmap_pairsearch_in, int max_gmap_segments_in,
23680 int max_gmap_improvement_in, int antistranded_penalty_in) {
22914 int max_gmap_improvement_in, int antistranded_penalty_in,
22915 int max_floors_readlength_in) {
2368122916 bool gmapp = false;
2368222917
2368322918 use_sarray_p = use_sarray_p_in;
2381823053 snpp = false;
2381923054 }
2382023055
23056 max_floors_readlength = max_floors_readlength_in;
23057
2382123058 return;
2382223059 }
0 /* $Id: stage1hr.h 186091 2016-03-17 22:23:16Z twu $ */
0 /* $Id: stage1hr.h 196434 2016-08-16 20:21:03Z twu $ */
11 #ifndef STAGE1HR_INCLUDED
22 #define STAGE1HR_INCLUDED
33
106106 int extramaterial_end_in, int extramaterial_paired_in,
107107 int gmap_mode, int trigger_score_for_gmap_in, int gmap_allowance_in,
108108 int max_gmap_pairsearch_in, int max_gmap_terminal_in,
109 int max_gmap_improvement_in, int antistranded_penalty_in);
109 int max_gmap_improvement_in, int antistranded_penalty_in,
110 int max_floors_readlength_in);
110111
111112
112113 #undef T
0 static char rcsid[] = "$Id: stage3.c 195963 2016-08-08 16:38:05Z twu $";
0 static char rcsid[] = "$Id: stage3.c 196409 2016-08-16 15:42:27Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
36223622 debug3(Pair_dump_list(exon,true));
36233623
36243624
3625 if (exon == NULL) {
3626 *trim5p = false;
3627 return pairs;
3628 }
3629
36253630 max_nmatches = max_nmismatches = 0;
36263631 nmatches = nmismatches = 0;
36273632 max_score = score = 0;
39093914 debug3(printf("End exon:\n"));
39103915 debug3(Pair_dump_list(exon,true));
39113916
3917
3918 if (exon == NULL) {
3919 *trim3p = false;
3920 return path;
3921 }
39123922
39133923 max_nmatches = max_nmismatches = 0;
39143924 nmatches = nmismatches = 0;
1240312413 int sense_try, int sense_filter,
1240412414 Oligoindex_array_T oligoindices_minor, Diagpool_T diagpool, Cellpool_T cellpool) {
1240512415 struct Pair_T *pairarray1;
12406 List_T pairs_fwd_copy, pairs_rev_copy, p;
12416 List_T p;
1240712417 Chrpos_T *last_genomedp5_fwd = NULL, *last_genomedp3_fwd = NULL, *last_genomedp5_rev = NULL, *last_genomedp3_rev = NULL;
1240812418 List_T pairs_pretrim, pairs_fwd, pairs_rev, best_pairs, temp_pairs, path_fwd, path_rev, best_path, temp_path;
1240912419 List_T copy;
1241912429 int fwd_ambig_end_length_5 = 0, fwd_ambig_end_length_3 = 0, rev_ambig_end_length_5 = 0, rev_ambig_end_length_3 = 0, temp_ambig_end_length;
1242012430 Splicetype_T fwd_ambig_splicetype_5, fwd_ambig_splicetype_3, rev_ambig_splicetype_5, rev_ambig_splicetype_3, temp_ambig_splicetype;
1242112431 double fwd_ambig_prob_5, fwd_ambig_prob_3, rev_ambig_prob_5, rev_ambig_prob_3, temp_ambig_prob;
12432 #ifdef GSNAP
12433 List_T pairs_fwd_copy, pairs_rev_copy;
12434 #endif
12435
12436
1242212437
1242312438 #ifdef COMPLEX_DIRECTION
1242412439 int indel_alignment_score_fwd, indel_alignment_score_rev;
0 static char rcsid[] = "$Id: stage3hr.c 195760 2016-08-04 00:12:04Z twu $";
0 static char rcsid[] = "$Id: stage3hr.c 196429 2016-08-16 20:09:56Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
17811781 Substring_T substring;
17821782 Junction_T junction;
17831783
1784 debug0(printf("Freeing Stage3end %p of type %s\n",*old,hittype_string((*old)->hittype)));
1784 #ifdef DEBUG0
1785 printf("Freeing Stage3end %p of type %s",*old,hittype_string((*old)->hittype));
1786 if ((*old)->hittype == SUBSTRINGS) {
1787 if (Substring_list_ambiguous_p((*old)->substrings_1toN) == true) {
1788 printf(" ambiguous");
1789 } else {
1790 printf(" not ambiguous");
1791 }
1792 }
1793 printf("\n");
1794 #endif
17851795
17861796 #if 0
17871797 FREE_OUT((*old)->ambcoords_donor);
78627872 new->genomicstart = Substring_genomicstart(acceptor);
78637873 new->genomicend = Substring_genomicend(acceptor);
78647874
7865 donor = Substring_new_ambig(/*querystart*/0,/*queryend*/Substring_querystart(acceptor),
7866 /*splice_pos*/Substring_querystart(acceptor),querylength,
7867 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
7868 new->plusp,new->genestrand,
7869 ambcoords_donor,amb_knowni_donor,amb_nmismatches_donor,amb_probs_donor,
7870 /*amb_common_prob*/acceptor_prob,/*amb_donor_common_p*/false,
7871 /*substring1p*/true);
7875 donor = Substring_new_ambig_D(/*querystart*/0,/*queryend*/Substring_querystart(acceptor),
7876 /*splice_pos*/Substring_querystart(acceptor),querylength,
7877 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
7878 new->plusp,new->genestrand,
7879 ambcoords_donor,amb_knowni_donor,amb_nmismatches_donor,amb_probs_donor,
7880 /*amb_common_prob*/acceptor_prob,/*substring1p*/true);
78727881 debug0(printf("Making sense ambiguous donor at %d..%d with %d matches\n",
78737882 0,Substring_querystart(acceptor),Substring_nmatches(donor)));
78747883 donor_prob = Doublelist_max(amb_probs_donor);
78777886 new->genomicstart = Substring_genomicstart(donor);
78787887 new->genomicend = Substring_genomicend(donor);
78797888
7880 acceptor = Substring_new_ambig(/*querystart*/Substring_queryend(donor),/*queryend*/querylength,
7881 /*splice_pos*/Substring_queryend(donor),querylength,
7882 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
7883 new->plusp,new->genestrand,
7884 ambcoords_acceptor,amb_knowni_acceptor,amb_nmismatches_acceptor,amb_probs_acceptor,
7885 /*amb_common_prob*/donor_prob,/*amb_donor_common_p*/true,
7886 /*substring1p*/false);
7889 acceptor = Substring_new_ambig_A(/*querystart*/Substring_queryend(donor),/*queryend*/querylength,
7890 /*splice_pos*/Substring_queryend(donor),querylength,
7891 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
7892 new->plusp,new->genestrand,
7893 ambcoords_acceptor,amb_knowni_acceptor,amb_nmismatches_acceptor,amb_probs_acceptor,
7894 /*amb_common_prob*/donor_prob,/*substring1p*/false);
78877895 debug0(printf("Making sense ambiguous donor at %d..%d with %d matches\n",
78887896 Substring_queryend(donor),querylength,Substring_nmatches(acceptor)));
78897897 acceptor_prob = Doublelist_max(amb_probs_acceptor);
78997907 new->genomicstart = Substring_genomicstart(acceptor);
79007908 new->genomicend = Substring_genomicend(acceptor);
79017909
7902 donor = Substring_new_ambig(/*querystart*/Substring_queryend(acceptor),/*queryend*/querylength,
7903 /*splice_pos*/Substring_queryend(acceptor),querylength,
7904 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
7905 new->plusp,new->genestrand,
7906 ambcoords_donor,amb_knowni_donor,amb_nmismatches_donor,amb_probs_donor,
7907 /*amb_common_prob*/acceptor_prob,/*amb_donor_common_p*/false,
7908 /*substring1p*/false);
7910 donor = Substring_new_ambig_D(/*querystart*/Substring_queryend(acceptor),/*queryend*/querylength,
7911 /*splice_pos*/Substring_queryend(acceptor),querylength,
7912 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
7913 new->plusp,new->genestrand,
7914 ambcoords_donor,amb_knowni_donor,amb_nmismatches_donor,amb_probs_donor,
7915 /*amb_common_prob*/acceptor_prob,/*substring1p*/false);
79097916 debug0(printf("Making antisense ambiguous donor at %d..%d with %d matches\n",
79107917 Substring_queryend(acceptor),querylength,Substring_nmatches(donor)));
79117918 donor_prob = Doublelist_max(amb_probs_donor);
79147921 new->genomicstart = Substring_genomicstart(donor);
79157922 new->genomicend = Substring_genomicend(donor);
79167923
7917 acceptor = Substring_new_ambig(/*querystart*/0,/*queryend*/Substring_querystart(donor),
7918 /*splice_pos*/Substring_querystart(donor),querylength,
7919 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
7920 new->plusp,new->genestrand,
7921 ambcoords_acceptor,amb_knowni_acceptor,amb_nmismatches_acceptor,amb_probs_acceptor,
7922 /*amb_common_prob*/donor_prob,/*amb_donor_common_p*/true,
7923 /*substring1p*/true);
7924 acceptor = Substring_new_ambig_A(/*querystart*/0,/*queryend*/Substring_querystart(donor),
7925 /*splice_pos*/Substring_querystart(donor),querylength,
7926 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
7927 new->plusp,new->genestrand,
7928 ambcoords_acceptor,amb_knowni_acceptor,amb_nmismatches_acceptor,amb_probs_acceptor,
7929 /*amb_common_prob*/donor_prob,/*substring1p*/true);
79247930 debug0(printf("Making antisense ambiguous acceptor at %d..%d with %d matches\n",
79257931 0,Substring_querystart(donor),Substring_nmatches(acceptor)));
79267932 acceptor_prob = Doublelist_max(amb_probs_acceptor);
79897995 new->substrings_1toN = List_copy(new->substrings_LtoH);
79907996 new->substrings_Nto1 = List_reverse(List_copy(new->substrings_LtoH));
79917997 assert(Substring_querystart(List_head(new->substrings_1toN)) < Substring_querystart(List_head(new->substrings_Nto1)));
7992
79937998
79947999 if (first_read_p == true) {
79958000 substring_for_concordance = (Substring_T) List_head(new->substrings_Nto1);
81538158 debug0(printf("Returning new splice %p at genomic %u..%u, donor %p (%u => %u), acceptor %p (%u => %u), score %d\n",
81548159 new,new->genomicstart - new->chroffset,new->genomicend - new->chroffset,donor,
81558160 donor == NULL ? 0 : Substring_left_genomicseg(donor),
8156 donor == NULL ? 0 : Substring_splicecoord(donor),
8161 donor == NULL ? 0 : Substring_splicecoord_D(donor),
81578162 acceptor,acceptor == NULL ? 0 : Substring_left_genomicseg(acceptor),
8158 acceptor == NULL ? 0 : Substring_splicecoord(acceptor),new->score));
8163 acceptor == NULL ? 0 : Substring_splicecoord_A(acceptor),new->score));
81598164 debug0(printf("sensedir %d\n",new->sensedir));
81608165 return new;
81618166 }
82378242 /* Compute distances */
82388243 if (donor == NULL) {
82398244 new->shortexonA_distance = 0;
8240 } else if (Substring_splicecoord_A(shortexon) > Substring_splicecoord(donor)) {
8241 new->shortexonA_distance = Substring_splicecoord_A(shortexon) - Substring_splicecoord(donor);
8242 } else {
8243 new->shortexonA_distance = Substring_splicecoord(donor) - Substring_splicecoord_A(shortexon);
8245 } else if (Substring_splicecoord_A(shortexon) > Substring_splicecoord_D(donor)) {
8246 new->shortexonA_distance = Substring_splicecoord_A(shortexon) - Substring_splicecoord_D(donor);
8247 } else {
8248 new->shortexonA_distance = Substring_splicecoord_D(donor) - Substring_splicecoord_A(shortexon);
82448249 }
82458250
82468251 if (acceptor == NULL) {
82478252 new->shortexonD_distance = 0;
8248 } else if (Substring_splicecoord_D(shortexon) > Substring_splicecoord(acceptor)) {
8249 new->shortexonD_distance = Substring_splicecoord_D(shortexon) - Substring_splicecoord(acceptor);
8250 } else {
8251 new->shortexonD_distance = Substring_splicecoord(acceptor) - Substring_splicecoord_D(shortexon);
8253 } else if (Substring_splicecoord_D(shortexon) > Substring_splicecoord_A(acceptor)) {
8254 new->shortexonD_distance = Substring_splicecoord_D(shortexon) - Substring_splicecoord_A(acceptor);
8255 } else {
8256 new->shortexonD_distance = Substring_splicecoord_A(acceptor) - Substring_splicecoord_D(shortexon);
82528257 }
82538258 new->distance = new->shortexonA_distance + new->shortexonD_distance;
82548259
82688273 if (sensedir == SENSE_FORWARD) {
82698274 substring0 = copy_donor_p ? Substring_copy(donor) : donor;
82708275 if (donor == NULL) {
8271 donor = substring0 = Substring_new_ambig(/*querystart*/0,/*queryend*/Substring_querystart(shortexon),
8272 /*splice_pos*/Substring_querystart(shortexon),querylength,
8273 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
8274 new->plusp,new->genestrand,
8275 ambcoords_donor,amb_knowni_donor,amb_nmismatches_donor,amb_probs_donor,
8276 /*amb_common_prob*/acceptor_prob,/*amb_donor_common_p*/false,
8277 /*substring1p*/true);
8276 donor = substring0 = Substring_new_ambig_D(/*querystart*/0,/*queryend*/Substring_querystart(shortexon),
8277 /*splice_pos*/Substring_querystart(shortexon),querylength,
8278 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
8279 new->plusp,new->genestrand,
8280 ambcoords_donor,amb_knowni_donor,amb_nmismatches_donor,amb_probs_donor,
8281 /*amb_common_prob*/acceptor_prob,/*substring1p*/true);
82788282 /* new->start_amb_prob = Doublelist_max(amb_probs_donor); */
82798283 /* new->start_amb_length = amb_length_donor; */
82808284 junction0 = Junction_new_splice(/*distance*/0,sensedir,Doublelist_max(amb_probs_donor),shortexonA_prob);
8281 } else if (Substring_splicecoord_A(shortexon) > Substring_splicecoord(donor)) {
8282 distance = Substring_splicecoord_A(shortexon) - Substring_splicecoord(donor);
8285 } else if (Substring_splicecoord_A(shortexon) > Substring_splicecoord_D(donor)) {
8286 distance = Substring_splicecoord_A(shortexon) - Substring_splicecoord_D(donor);
82838287 junction0 = Junction_new_splice(distance,sensedir,donor_prob,shortexonA_prob);
82848288 } else {
8285 distance = Substring_splicecoord(donor) - Substring_splicecoord_A(shortexon);
8289 distance = Substring_splicecoord_D(donor) - Substring_splicecoord_A(shortexon);
82868290 junction0 = Junction_new_splice(distance,sensedir,donor_prob,shortexonA_prob);
82878291 }
82888292
82898293 substring2 = copy_acceptor_p ? Substring_copy(acceptor) : acceptor;
82908294 if (acceptor == NULL) {
8291 acceptor = substring2 = Substring_new_ambig(/*querystart*/Substring_queryend(shortexon),/*queryend*/querylength,
8292 /*splice_pos*/Substring_queryend(shortexon),querylength,
8293 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
8294 new->plusp,new->genestrand,
8295 ambcoords_acceptor,amb_knowni_acceptor,amb_nmismatches_acceptor,amb_probs_acceptor,
8296 /*amb_common_prob*/donor_prob,/*amb_donor_common_p*/true,
8297 /*substring1p*/false);
8295 acceptor = substring2 = Substring_new_ambig_A(/*querystart*/Substring_queryend(shortexon),/*queryend*/querylength,
8296 /*splice_pos*/Substring_queryend(shortexon),querylength,
8297 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
8298 new->plusp,new->genestrand,
8299 ambcoords_acceptor,amb_knowni_acceptor,amb_nmismatches_acceptor,amb_probs_acceptor,
8300 /*amb_common_prob*/donor_prob,/*substring1p*/false);
82988301 /* new->end_amb_prob = Doublelist_max(amb_probs_acceptor); */
82998302 /* new->end_amb_length = amb_length_acceptor; */
83008303 junction2 = Junction_new_splice(/*distance*/0,sensedir,shortexonD_prob,Doublelist_max(amb_probs_acceptor));
8301 } else if (Substring_splicecoord_D(shortexon) > Substring_splicecoord(acceptor)) {
8302 distance = Substring_splicecoord_D(shortexon) - Substring_splicecoord(acceptor);
8304 } else if (Substring_splicecoord_D(shortexon) > Substring_splicecoord_A(acceptor)) {
8305 distance = Substring_splicecoord_D(shortexon) - Substring_splicecoord_A(acceptor);
83038306 junction2 = Junction_new_splice(distance,sensedir,shortexonD_prob,acceptor_prob);
83048307 } else {
8305 distance = Substring_splicecoord(acceptor) - Substring_splicecoord_D(shortexon);
8308 distance = Substring_splicecoord_A(acceptor) - Substring_splicecoord_D(shortexon);
83068309 junction2 = Junction_new_splice(distance,sensedir,shortexonD_prob,acceptor_prob);
83078310 }
83088311
83098312 } else if (sensedir == SENSE_ANTI) {
83108313 substring0 = copy_acceptor_p ? Substring_copy(acceptor) : acceptor;
83118314 if (acceptor == NULL) {
8312 acceptor = substring0 = Substring_new_ambig(/*querystart*/0,/*queryend*/Substring_querystart(shortexon),
8313 /*splice_pos*/Substring_querystart(shortexon),querylength,
8314 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
8315 new->plusp,new->genestrand,
8316 ambcoords_acceptor,amb_knowni_acceptor,amb_nmismatches_acceptor,amb_probs_acceptor,
8317 /*amb_common_prob*/donor_prob,/*amb_donor_common_p*/true,
8318 /*substring1p*/true);
8315 acceptor = substring0 = Substring_new_ambig_A(/*querystart*/0,/*queryend*/Substring_querystart(shortexon),
8316 /*splice_pos*/Substring_querystart(shortexon),querylength,
8317 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
8318 new->plusp,new->genestrand,
8319 ambcoords_acceptor,amb_knowni_acceptor,amb_nmismatches_acceptor,amb_probs_acceptor,
8320 /*amb_common_prob*/donor_prob,/*substring1p*/true);
83198321 /* new->start_amb_prob = Doublelist_max(amb_probs_acceptor); */
83208322 /* new->start_amb_length = amb_length_acceptor; */
83218323 junction0 = Junction_new_splice(/*distance*/0,sensedir,shortexonD_prob,Doublelist_max(amb_probs_acceptor));
8322 } else if (Substring_splicecoord_D(shortexon) > Substring_splicecoord(acceptor)) {
8323 distance = Substring_splicecoord_D(shortexon) - Substring_splicecoord(acceptor);
8324 } else if (Substring_splicecoord_D(shortexon) > Substring_splicecoord_A(acceptor)) {
8325 distance = Substring_splicecoord_D(shortexon) - Substring_splicecoord_A(acceptor);
83248326 junction0 = Junction_new_splice(distance,sensedir,shortexonD_prob,acceptor_prob);
83258327 } else {
8326 distance = Substring_splicecoord(acceptor) - Substring_splicecoord_D(shortexon);
8328 distance = Substring_splicecoord_A(acceptor) - Substring_splicecoord_D(shortexon);
83278329 junction0 = Junction_new_splice(distance,sensedir,shortexonD_prob,acceptor_prob);
83288330 }
83298331
83308332 substring2 = copy_donor_p ? Substring_copy(donor) : donor;
83318333 if (donor == NULL) {
8332 donor = substring2 = Substring_new_ambig(/*querystart*/Substring_queryend(shortexon),/*queryend*/querylength,
8333 /*splice_pos*/Substring_queryend(shortexon),querylength,
8334 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
8335 new->plusp,new->genestrand,
8336 ambcoords_donor,amb_knowni_donor,amb_nmismatches_donor,amb_probs_donor,
8337 /*amb_common_prob*/acceptor_prob,/*amb_donor_common_p*/false,
8338 /*substring1p*/false);
8334 donor = substring2 = Substring_new_ambig_D(/*querystart*/Substring_queryend(shortexon),/*queryend*/querylength,
8335 /*splice_pos*/Substring_queryend(shortexon),querylength,
8336 new->chrnum,new->chroffset,new->chrhigh,new->chrlength,
8337 new->plusp,new->genestrand,
8338 ambcoords_donor,amb_knowni_donor,amb_nmismatches_donor,amb_probs_donor,
8339 /*amb_common_prob*/acceptor_prob,/*substring1p*/false);
83398340 /* new->end_amb_prob = Doublelist_max(amb_probs_donor); */
83408341 /* new->end_amb_length = amb_length_donor; */
83418342 junction2 = Junction_new_splice(/*distance*/0,sensedir,Doublelist_max(amb_probs_donor),shortexonA_prob);
8342 } else if (Substring_splicecoord_A(shortexon) > Substring_splicecoord(donor)) {
8343 distance = Substring_splicecoord_A(shortexon) - Substring_splicecoord(donor);
8343 } else if (Substring_splicecoord_A(shortexon) > Substring_splicecoord_D(donor)) {
8344 distance = Substring_splicecoord_A(shortexon) - Substring_splicecoord_D(donor);
83448345 junction2 = Junction_new_splice(distance,sensedir,donor_prob,shortexonA_prob);
83458346 } else {
8346 distance = Substring_splicecoord(donor) - Substring_splicecoord_A(shortexon);
8347 distance = Substring_splicecoord_D(donor) - Substring_splicecoord_A(shortexon);
83478348 junction2 = Junction_new_splice(distance,sensedir,donor_prob,shortexonA_prob);
83488349 }
83498350
1015310154 if (hit->hittype == TERMINAL) {
1015410155 /* Don't allow terminals to set trims */
1015510156
10157 } else if (hit->hittype == INSERTION || hit->hittype == DELETION) {
10158 /* Don't allow indels to set trims, since they artificially align at the end */
10159
1015610160 #if 0
1015710161 } else if ((hit->hittype == INSERTION || hit->hittype == DELETION) &&
1015810162 (hit->indel_pos < 15 || hit->indel_pos > hit->querylength - 15)) {
1138911393 }
1139011394 #endif
1139111395
11392 /* Favors definitive splices over ambiguous splices. So need to
11393 make sure we don't make definitive splices unnecessarily */
11396 /* Favors ambiguous splices over definitive splices */
1139411397 if (hit->nsegments > best_hit->nsegments) {
1139511398 if (hit->nmatches_posttrim > best_hit->nmatches_posttrim) {
1139611399 /* More segments and strictly more matches */
1143211435 debug7(printf(" => %d wins by hittype\n",k));
1143311436 return +1;
1143411437
11435 #if 0
11436 } else if (hit->start_amb_length + hit->end_amb_length == 0 &&
11437 best_hit->start_amb_length + best_hit->end_amb_length > 0) {
11438 } else if (start_amb_length(hit) + end_amb_length(hit) > 0 &&
11439 start_amb_length(best_hit) + end_amb_length(best_hit) == 0) {
1143811440 debug7(printf(" => %d loses by ambiguity\n",k));
1143911441 return -1;
11440 } else if (hit->start_amb_length + hit->end_amb_length > 0 &&
11441 best_hit->start_amb_length + best_hit->end_amb_length == 0) {
11442 } else if (start_amb_length(hit) + end_amb_length(hit) == 0 &&
11443 start_amb_length(best_hit) + end_amb_length(best_hit) > 0) {
1144211444 debug7(printf(" => %d wins by ambiguity\n",k));
1144311445 return +1;
11444 #endif
1144511446
1144611447 } else if (hit->nindels > best_hit->nindels) {
1144711448 debug7(printf(" => %d loses by nindels\n",k));
1550615507 #endif
1550715508
1550815509
15509 /* Favors definitive splices over ambiguous splices. So need to
15510 make sure we don't make definitive splices unnecessarily */
15511
15510 /* Favors ambiguous splices over definitive splices */
1551215511 if (hitpair->hit5->nsegments + hitpair->hit3->nsegments > best_hitpair->hit5->nsegments + best_hitpair->hit3->nsegments) {
1551315512 if (hitpair->nmatches_posttrim > best_hitpair->nmatches_posttrim) {
1551415513 /* More segments and strictly more matches */
1559715596 return +1;
1559815597 #endif
1559915598
15600 #if 0
15601 } else if (hitpair->hit5->start_amb_length + hitpair->hit5->end_amb_length +
15602 hitpair->hit3->start_amb_length + hitpair->hit3->end_amb_length > 0 &&
15603 best_hitpair->hit5->start_amb_length + best_hitpair->hit5->end_amb_length +
15604 best_hitpair->hit3->start_amb_length + best_hitpair->hit3->end_amb_length == 0) {
15599 } else if (start_amb_length(hitpair->hit5) + end_amb_length(hitpair->hit5) +
15600 start_amb_length(hitpair->hit3) + end_amb_length(hitpair->hit3) == 0 &&
15601 start_amb_length(best_hitpair->hit5) + end_amb_length(best_hitpair->hit5) +
15602 start_amb_length(best_hitpair->hit3) + end_amb_length(best_hitpair->hit3) > 0) {
1560515603 debug8(printf(" => loses by ambiguity\n"));
1560615604 return -1;
1560715605
15608 } else if (hitpair->hit5->start_amb_length + hitpair->hit5->end_amb_length +
15609 hitpair->hit3->start_amb_length + hitpair->hit3->end_amb_length == 0 &&
15610 best_hitpair->hit5->start_amb_length + best_hitpair->hit5->end_amb_length +
15611 best_hitpair->hit3->start_amb_length + best_hitpair->hit3->end_amb_length > 0) {
15606 } else if (start_amb_length(hitpair->hit5) + end_amb_length(hitpair->hit5) +
15607 start_amb_length(hitpair->hit3) + end_amb_length(hitpair->hit3) > 0 &&
15608 start_amb_length(best_hitpair->hit5) + end_amb_length(best_hitpair->hit5) +
15609 start_amb_length(best_hitpair->hit3) + end_amb_length(best_hitpair->hit3) == 0) {
1561215610 debug8(printf(" => wins by ambiguity\n"));
1561315611 return +1;
15614 #endif
1561515612
1561615613 #if 0
1561715614 } else if (hitpair->absdifflength < best_hitpair->absdifflength) {
1653116528 if (hit5->hittype == TERMINAL) {
1653216529 /* Don't allow terminals to set trims */
1653316530
16531 } else if (hit5->hittype == INSERTION || hit5->hittype == DELETION) {
16532 /* Don't allow indels to set trims, since they artificially align at the end */
16533
1653416534 #if 0
1653516535 } else if ((hit5->hittype == INSERTION || hit5->hittype == DELETION) &&
1653616536 (hit5->indel_pos < 15 || hit5->indel_pos > hit5->querylength - 15)) {
1655216552
1655316553 if (hit3->hittype == TERMINAL) {
1655416554 /* Don't allow terminals to set trims */
16555
16556 } else if (hit3->hittype == INSERTION || hit3->hittype == DELETION) {
16557 /* Don't allow indels to set trims, since they artificially align at the end */
1655516558
1655616559 #if 0
1655716560 } else if ((hit3->hittype == INSERTION || hit3->hittype == DELETION) &&
1735717360 (*nconcordant)++;
1735817361 }
1735917362
17360 if (0 && *nconcordant > maxpairedpaths) {
17363 if (*nconcordant > maxpairedpaths) {
1736117364 debug(printf(" -- %d concordant paths exceeds %d",*nconcordant,maxpairedpaths));
1736217365 *abort_pairing_p = true;
1736317366 }
1747717480 (*nconcordant)++;
1747817481 }
1747917482
17480 if (0 && *nconcordant > maxpairedpaths) {
17483 if (*nconcordant > maxpairedpaths) {
1748117484 debug(printf(" -- %d concordant paths exceeds %d",*nconcordant,maxpairedpaths));
1748217485 *abort_pairing_p = true;
1748317486 }
0 /* $Id: stage3hr.h 195760 2016-08-04 00:12:04Z twu $ */
0 /* $Id: stage3hr.h 196273 2016-08-12 15:15:06Z twu $ */
11 #ifndef STAGE3HR_INCLUDED
22 #define STAGE3HR_INCLUDED
33
0 static char rcsid[] = "$Id: substring.c 195961 2016-08-08 16:36:34Z twu $";
0 static char rcsid[] = "$Id: substring.c 196404 2016-08-16 14:47:49Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
3030 #define SCRAMBLE_TEXT "scramble"
3131
3232 #define END_SPLICESITE_SEARCH 10
33 #define MIN_EXON_LENGTH 9
3334 #define END_SPLICESITE_PROB_MATCH 0.90
3435 #define END_SPLICESITE_PROB_MISMATCH 0.95
3536
190191 return "";
191192 }
192193
194 char *
195 Trimaction_string (Trimaction_T trimaction) {
196 switch (trimaction) {
197 case NO_TRIM: return "NO_TRIM";
198 case PRE_TRIMMED: return "PRE_TRIMMED";
199 case COMPUTE_TRIM: return "COMPUTE_TRIM";
200 default:
201 fprintf(stderr,"Unexpected trimaction %d\n",trimaction);
202 abort();
203 }
204 return "";
205 }
206
207
193208
194209 static char complCode[128] = COMPLEMENT_LC;
195210
296311 /* for splices */
297312 int chimera_sensedir;
298313
299 Univcoord_T splicecoord;
300 int splicesites_knowni; /* Needed for intragenic_splice_p in stage1hr.c */
301
302 bool chimera_knownp; /* Used for computing Substring_nchimera_known */
303 bool chimera_novelp;
304 Univcoord_T chimera_modelpos;
305 int chimera_pos;
306 double chimera_prob;
314 Univcoord_T splicecoord_D;
315 int splicesitesD_knowni; /* Needed for intragenic_splice_p in stage1hr.c */
316
317 bool siteD_knownp; /* Used for computing Substring_nchimera_known */
318 bool siteD_novelp;
319 int siteD_pos;
320 double siteD_prob;
307321
308322 /* for shortexon (always use *_1 for acceptor and *_2 for donor) */
309323 /* for donor/acceptor: the ambiguous position */
310 Univcoord_T splicecoord_2;
311 int splicesites_knowni_2;
312
313 bool chimera_knownp_2;
314 bool chimera_novelp_2;
315 Univcoord_T chimera_modelpos_2;
316 int chimera_pos_2;
317 double chimera_prob_2;
318
324 Univcoord_T splicecoord_A;
325 int splicesitesA_knowni;
326
327 bool siteA_knownp;
328 bool siteA_novelp;
329 int siteA_pos;
319330 double siteA_prob;
320 double siteD_prob;
331
332 Univcoord_T splicecoord_N; /* For DNA fusions */
333 int siteN_pos;
334
321335
322336 bool ambiguous_p;
323337 int nambcoords;
325339 int *amb_knowni;
326340 int *amb_nmismatches;
327341 double *amb_probs;
328 double amb_common_prob;
329 bool amb_donor_common_p;
342 Endtype_T amb_type; /* Ambiguous DONs or ACCs */
330343 };
331344
332345
354367 this->alignend += chrlength;
355368 this->alignstart_trim += chrlength;
356369 this->alignend_trim += chrlength;
357 this->chimera_modelpos += chrlength;
358 this->chimera_modelpos_2 += chrlength;
370 this->splicecoord_D += chrlength;
371 this->splicecoord_A += chrlength;
372 this->splicecoord_N += chrlength;
359373 }
360374
361375 return;
386400 this->alignend -= chrlength;
387401 this->alignstart_trim -= chrlength;
388402 this->alignend_trim -= chrlength;
389 this->chimera_modelpos -= chrlength;
390 this->chimera_modelpos_2 -= chrlength;
403 this->splicecoord_D -= chrlength;
404 this->splicecoord_A -= chrlength;
405 this->splicecoord_N -= chrlength;
391406 }
392407
393408 return;
628643 int trim5, alignlength, pos, prevpos, i;
629644 int nmismatches;
630645
631 #ifdef HAVE_ALLOCA
646 #if defined(LONG_READLENGTHS)
647 int *mismatch_positions = (int *) MALLOC(querylength*sizeof(int));
648 #elif defined(HAVE_ALLOCA)
632649 int *mismatch_positions = (int *) ALLOCA(querylength*sizeof(int));
633650 #else
634651 int mismatch_positions[MAX_READLENGTH];
734751 }
735752 }
736753
754 #if defined(LONG_READLENGTHS)
755 FREE(mismatch_positions);
756 #elif defined(HAVE_ALLOCA)
757 FREEA(mismatch_positions);
758 #else
759 /* Hard-coded use of MAX_READLENGTH */
760 #endif
737761
738762 debug8(printf("Trim left pos 0, score %d, trim5 %d, nmismatches_end %d\n",score,trim5,*nmismatches_end));
739763 debug8(printf("\n"));
752776 int trim3, alignlength, pos, prevpos, i;
753777 int nmismatches;
754778
755 #ifdef HAVE_ALLOCA
779 #if defined(LONG_READLENGTHS)
780 int *mismatch_positions = (int *) MALLOC(querylength*sizeof(int));
781 #elif defined(HAVE_ALLOCA)
756782 int *mismatch_positions = (int *) ALLOCA(querylength*sizeof(int));
757783 #else
758784 int mismatch_positions[MAX_READLENGTH];
854880 }
855881 }
856882
883 #if defined(LONG_READLENGTHS)
884 FREE(mismatch_positions);
885 #elif defined(HAVE_ALLOCA)
886 FREEA(mismatch_positions);
887 #else
888 /* Hard-coded use of MAX_READLENGTH */
889 #endif
857890
858891 debug8(printf("Trim right pos %d, score %d, trim3 %d, nmismatches_end %d\n",queryend-1,score,trim3,*nmismatches_end));
859892 debug8(printf("\n"));
17871820
17881821
17891822 /* Modified from trim_novel_spliceends in stage3.c */
1790 void
1823 /* Note: If substring does not extend to ends of query, then region
1824 beyond querystart and queryend might actually be matching, and not
1825 mismatches. Could fix in the future. */
1826 static void
17911827 substring_trim_novel_spliceends (T substring1, T substringN, int *ambig_end_length_5, int *ambig_end_length_3,
17921828 Splicetype_T *ambig_splicetype_5, Splicetype_T *ambig_splicetype_3,
17931829 double *ambig_prob_5, double *ambig_prob_3, int *sensedir,
18071843 int splice_sensedir_5, splice_sensedir_3, splice_sensedir_5_mm, splice_sensedir_3_mm;
18081844
18091845
1810 debug13(printf("\nEntered Substring_trim_novel_spliceends with sensedir %d\n",*sensedir));
1846 debug13(printf("\nEntered substring_trim_novel_spliceends with sensedir %d\n",*sensedir));
18111847 *ambig_end_length_5 = 0;
18121848 *ambig_end_length_3 = 0;
1849 *ambig_prob_5 = 0.0;
1850 *ambig_prob_3 = 0.0;
18131851
18141852 /* start is distal, end is medial */
18151853 if (substringN == NULL) {
18161854 /* Skip 3' end*/
18171855 } else if (substringN->plusp == true) {
1818 start = substringN->genomicend;
18191856 middle = substringN->alignend_trim + 1;
1820 if ((end = middle - END_SPLICESITE_SEARCH) < substringN->alignstart_trim) {
1821 end = substringN->alignstart_trim;
1822 }
1823 } else {
1824 start = substringN->genomicend;
1857 if ((start = middle + END_SPLICESITE_SEARCH) > substringN->genomicend) {
1858 start = substringN->genomicend;
1859 }
1860 if ((end = middle - END_SPLICESITE_SEARCH) < substringN->alignstart_trim + MIN_EXON_LENGTH) {
1861 end = substringN->alignstart_trim + MIN_EXON_LENGTH;
1862 }
1863 debug13(printf("\n1 Set end points for 3' trim to be %u..%u..%u\n",start,middle,end));
1864
1865 } else {
18251866 middle = substringN->alignend_trim - 1;
1826 if ((end = middle + END_SPLICESITE_SEARCH) > substringN->alignstart_trim) {
1827 end = substringN->alignstart_trim;
1828 }
1867 if ((start = middle - END_SPLICESITE_SEARCH) < substringN->genomicend) {
1868 start = substringN->genomicend;
1869 }
1870 if ((end = middle + END_SPLICESITE_SEARCH) > substringN->alignstart_trim - MIN_EXON_LENGTH) {
1871 end = substringN->alignstart_trim - MIN_EXON_LENGTH;
1872 }
1873 debug13(printf("\n2 Set end points for 3' trim to be %u..%u..%u\n",start,middle,end));
18291874 }
18301875
18311876 if (substringN == NULL) {
18381883 middle_genomicpos = middle;
18391884 end_genomicpos = end;
18401885
1841 assert(start_genomicpos >= end_genomicpos);
1886 /* assert(start_genomicpos >= end_genomicpos); */
18421887 genomicpos = start_genomicpos;
18431888 while (genomicpos >= middle_genomicpos) {
18441889 donor_prob = Maxent_hr_donor_prob(genomicpos,chroffset); /* Case 1 */
18671912 middle_genomicpos = middle;
18681913 end_genomicpos = end;
18691914
1870 assert(start_genomicpos <= end_genomicpos);
1915 /* assert(start_genomicpos <= end_genomicpos); */
18711916 genomicpos = start_genomicpos;
18721917 while (genomicpos <= middle_genomicpos) {
18731918 donor_prob = Maxent_hr_antidonor_prob(genomicpos,chroffset); /* Case 3 */
18981943 middle_genomicpos = middle;
18991944 end_genomicpos = end;
19001945
1901 assert(start_genomicpos >= end_genomicpos);
1946 /* assert(start_genomicpos >= end_genomicpos); */
19021947 genomicpos = start_genomicpos;
19031948 while (genomicpos >= middle_genomicpos) {
19041949 acceptor_prob = Maxent_hr_antiacceptor_prob(genomicpos,chroffset); /* Case 5 */
19271972 middle_genomicpos = middle;
19281973 end_genomicpos = end;
19291974
1930 assert(start_genomicpos <= end_genomicpos);
1975 /* assert(start_genomicpos <= end_genomicpos); */
19311976 genomicpos = start_genomicpos;
19321977 while (genomicpos <= middle_genomicpos) {
19331978 acceptor_prob = Maxent_hr_acceptor_prob(genomicpos,chroffset); /* Case 7 */
19562001 middle_genomicpos = middle;
19572002 end_genomicpos = end;
19582003
1959 assert(start_genomicpos >= end_genomicpos);
2004 /* assert(start_genomicpos >= end_genomicpos); */
19602005 genomicpos = start_genomicpos;
19612006 while (genomicpos >= middle_genomicpos) {
19622007 donor_prob = Maxent_hr_donor_prob(genomicpos,chroffset); /* Case 1 */
20172062 middle_genomicpos = middle;
20182063 end_genomicpos = end;
20192064
2020 assert(start_genomicpos <= end_genomicpos);
2065 /* assert(start_genomicpos <= end_genomicpos); */
20212066 genomicpos = start_genomicpos;
20222067 while (genomicpos <= middle_genomicpos) {
20232068 donor_prob = Maxent_hr_antidonor_prob(genomicpos,chroffset); /* Case 3 */
20832128 Splicetype_string(splicetype3),splice_genomepos_3-chroffset,max_prob_3));
20842129 if (substringN->plusp) {
20852130 *ambig_end_length_3 = substringN->genomicend - splice_genomepos_3;
2131 debug13(printf("Set ambig_end_length_3 to be %d = %u - %u\n",*ambig_end_length_3,substringN->genomicend,splice_genomepos_3));
20862132 } else {
20872133 *ambig_end_length_3 = splice_genomepos_3 - substringN->genomicend;
2134 debug13(printf("Set ambig_end_length_3 to be %d = %u - %u\n",*ambig_end_length_3,splice_genomepos_3,substringN->genomicend));
20882135 }
20892136 *ambig_splicetype_3 = splicetype3;
20902137 *ambig_prob_3 = max_prob_3;
2091 debug13(printf("Set ambig_end_length_3 to be %d\n",*ambig_end_length_3));
20922138
20932139 } else if (max_prob_3_mm > END_SPLICESITE_PROB_MISMATCH) {
20942140 debug13(printf("Found good mismatch splice %s on 3' end at %u with probability %f\n",
20952141 Splicetype_string(splicetype3_mm),splice_genomepos_3_mm-chroffset,max_prob_3_mm));
20962142 if (substringN->plusp) {
20972143 *ambig_end_length_3 = substringN->genomicend - splice_genomepos_3_mm;
2144 debug13(printf("Set ambig_end_length_3 to be %d = %u - %u\n",*ambig_end_length_3,substringN->genomicend,splice_genomepos_3_mm));
20982145 } else {
20992146 *ambig_end_length_3 = splice_genomepos_3_mm - substringN->genomicend;
2147 debug13(printf("Set ambig_end_length_3 to be %d = %u - %u\n",*ambig_end_length_3,splice_genomepos_3_mm,substringN->genomicend));
21002148 }
21012149 *ambig_splicetype_3 = splicetype3_mm;
21022150 *ambig_prob_3 = max_prob_3_mm;
2103 debug13(printf("Set ambig_end_length_3 to be %d\n",*ambig_end_length_3));
21042151 }
21052152 }
21062153
21092156 if (substring1 == NULL) {
21102157 /* Skip 5' end */
21112158 } else if (substring1->plusp == true) {
2112 start = substring1->genomicstart;
21132159 middle = substring1->alignstart_trim - 1;
2114 if ((end = middle + END_SPLICESITE_SEARCH) > substring1->alignend_trim) {
2115 end = substring1->alignend_trim;
2116 }
2117 } else {
2118 start = substring1->genomicstart;
2160 if ((start = middle - END_SPLICESITE_SEARCH) < substring1->genomicstart) {
2161 start = substring1->genomicstart;
2162 }
2163 if ((end = middle + END_SPLICESITE_SEARCH) > substring1->alignend_trim - MIN_EXON_LENGTH) {
2164 end = substring1->alignend_trim - MIN_EXON_LENGTH;
2165 }
2166 debug13(printf("\n1 Set end points for 5' trim to be %u..%u..%u\n",start,middle,end));
2167
2168 } else {
21192169 middle = substring1->alignstart_trim + 1;
2120 if ((end = middle - END_SPLICESITE_SEARCH) < substring1->alignend_trim) {
2121 end = substring1->alignend_trim;
2122 }
2170 if ((start = middle + END_SPLICESITE_SEARCH) > substring1->genomicstart) {
2171 start = substring1->genomicstart;
2172 }
2173 if ((end = middle - END_SPLICESITE_SEARCH) < substring1->alignend_trim + MIN_EXON_LENGTH) {
2174 end = substring1->alignend_trim + MIN_EXON_LENGTH;
2175 }
2176 debug13(printf("\n2 Set end points for 5' trim to be %u..%u..%u\n",start,middle,end));
21232177 }
21242178
21252179 if (substring1 == NULL) {
21322186 middle_genomicpos = middle;
21332187 end_genomicpos = end;
21342188
2135 assert(start_genomicpos <= end_genomicpos);
2189 /* assert(start_genomicpos <= end_genomicpos); */
21362190 genomicpos = start_genomicpos;
21372191 while (genomicpos <= middle_genomicpos) {
21382192 acceptor_prob = Maxent_hr_acceptor_prob(genomicpos,chroffset); /* Case 2 */
21612215 middle_genomicpos = middle;
21622216 end_genomicpos = end;
21632217
2164 assert(start_genomicpos >= end_genomicpos);
2218 /* assert(start_genomicpos >= end_genomicpos); */
21652219 genomicpos = start_genomicpos;
21662220 while (genomicpos >= middle_genomicpos) {
21672221 acceptor_prob = Maxent_hr_antiacceptor_prob(genomicpos,chroffset); /* Case 4 */
21922246 middle_genomicpos = middle;
21932247 end_genomicpos = end;
21942248
2195 assert(start_genomicpos <= end_genomicpos);
2249 /* assert(start_genomicpos <= end_genomicpos); */
21962250 genomicpos = start_genomicpos;
21972251 while (genomicpos <= middle_genomicpos) {
21982252 donor_prob = Maxent_hr_antidonor_prob(genomicpos,chroffset); /* Case 6 */
22212275 middle_genomicpos = middle;
22222276 end_genomicpos = end;
22232277
2224 assert(start_genomicpos >= end_genomicpos);
2278 /* assert(start_genomicpos >= end_genomicpos); */
22252279 genomicpos = start_genomicpos;
22262280 while (genomicpos >= middle_genomicpos) {
22272281 donor_prob = Maxent_hr_donor_prob(genomicpos,chroffset); /* Case 8 */
22502304 middle_genomicpos = middle;
22512305 end_genomicpos = end;
22522306
2253 assert(start_genomicpos <= end_genomicpos);
2307 /* assert(start_genomicpos <= end_genomicpos); */
22542308 genomicpos = start_genomicpos;
22552309 while (genomicpos <= middle_genomicpos) {
22562310 acceptor_prob = Maxent_hr_acceptor_prob(genomicpos,chroffset); /* Case 2 */
23112365 middle_genomicpos = middle;
23122366 end_genomicpos = end;
23132367
2314 assert(start_genomicpos >= end_genomicpos);
2368 /* assert(start_genomicpos >= end_genomicpos); */
23152369 genomicpos = start_genomicpos;
23162370 while (genomicpos >= middle_genomicpos) {
23172371 acceptor_prob = Maxent_hr_antiacceptor_prob(genomicpos,chroffset); /* Case 4 */
23772431 Splicetype_string(splicetype5),splice_genomepos_5-chroffset,max_prob_5));
23782432 if (substring1->plusp) {
23792433 *ambig_end_length_5 = splice_genomepos_5 - substring1->genomicstart;
2434 debug13(printf("1 Set ambig_end_length_5 to be %d = %u - %u\n",*ambig_end_length_5,splice_genomepos_5,substring1->genomicstart));
23802435 } else {
23812436 *ambig_end_length_5 = substring1->genomicstart - splice_genomepos_5;
2437 debug13(printf("2 Set ambig_end_length_5 to be %d = %u - %u\n",*ambig_end_length_5,substring1->genomicstart,splice_genomepos_5));
23822438 }
23832439 *ambig_splicetype_5 = splicetype5;
23842440 *ambig_prob_5 = max_prob_5;
2385 debug13(printf("Set ambig_end_length_5 to be %d\n",*ambig_end_length_5));
23862441 } else if (max_prob_5_mm > END_SPLICESITE_PROB_MISMATCH) {
23872442 debug13(printf("Found good mismatch splice %s on 5' end at %u with probability %f\n",
23882443 Splicetype_string(splicetype5_mm),splice_genomepos_5_mm-chroffset,max_prob_5_mm));
23892444 if (substring1->plusp) {
2390 *ambig_end_length_5 = splice_genomepos_5 - substring1->genomicstart;
2445 *ambig_end_length_5 = splice_genomepos_5_mm - substring1->genomicstart;
2446 debug13(printf("3 Set ambig_end_length_5 to be %d = %u - %u\n",*ambig_end_length_5,splice_genomepos_5_mm,substring1->genomicstart));
23912447 } else {
2392 *ambig_end_length_5 = substring1->genomicstart - splice_genomepos_5;
2448 *ambig_end_length_5 = substring1->genomicstart - splice_genomepos_5_mm;
2449 debug13(printf("4 Set ambig_end_length_5 to be %d = %u - %u\n",*ambig_end_length_5,substring1->genomicstart,splice_genomepos_5_mm));
23932450 }
23942451 *ambig_splicetype_5 = splicetype5_mm;
23952452 *ambig_prob_5 = max_prob_5_mm;
2396 debug13(printf("Set ambig_end_length_5 to be %d\n",*ambig_end_length_5));
23972453 }
23982454 }
23992455
24662522 Splicetype_string(splicetype5),splice_genomepos_5-chroffset,max_prob_5));
24672523 if (substring1->plusp) {
24682524 *ambig_end_length_5 = splice_genomepos_5 - substring1->genomicstart;
2525 debug13(printf("5 Set ambig_end_length_5 to be %d = %u - %u\n",*ambig_end_length_5,splice_genomepos_5,substring1->genomicstart));
24692526 } else {
24702527 *ambig_end_length_5 = substring1->genomicstart - splice_genomepos_5;
2528 debug13(printf("6 Set ambig_end_length_5 to be %d = %u - %u\n",*ambig_end_length_5,substring1->genomicstart,splice_genomepos_5));
24712529 }
24722530 *ambig_splicetype_5 = splicetype5;
24732531 *ambig_prob_5 = max_prob_5;
24742532 /* *cdna_direction = splice_cdna_direction_5; */
2475 debug13(printf("Set ambig_end_length_5 to be %d\n",*ambig_end_length_5));
24762533 if (max_prob_sense_forward_5 >= END_SPLICESITE_PROB_MATCH && max_prob_sense_anti_5 < END_SPLICESITE_PROB_MATCH
24772534 && max_prob_sense_anti_3 < END_SPLICESITE_PROB_MATCH) {
24782535 *sensedir = splice_sensedir_5;
24902547 Splicetype_string(splicetype3_mm),splice_genomepos_3_mm-chroffset,max_prob_3_mm));
24912548 if (substringN->plusp) {
24922549 *ambig_end_length_3 = substringN->genomicend - splice_genomepos_3_mm;
2550 debug13(printf("Set ambig_end_length_3 to be %d = %u - %u\n",*ambig_end_length_3,substringN->genomicend,splice_genomepos_3_mm));
24932551 } else {
24942552 *ambig_end_length_3 = splice_genomepos_3_mm - substringN->genomicend;
2553 debug13(printf("Set ambig_end_length_3 to be %d = %u - %u\n",*ambig_end_length_3,splice_genomepos_3_mm,substringN->genomicend));
24952554 }
24962555 *ambig_splicetype_3 = splicetype3_mm;
24972556 *ambig_prob_3 = max_prob_3_mm;
24982557 /* *cdna_direction = splice_cdna_direction_3_mm; */
2499 debug13(printf("Set ambig_end_length_3 to be %d\n",*ambig_end_length_3));
25002558 if (max_prob_sense_forward_3_mm >= END_SPLICESITE_PROB_MISMATCH && max_prob_sense_anti_3_mm < END_SPLICESITE_PROB_MISMATCH
25012559 && max_prob_sense_anti_5_mm < END_SPLICESITE_PROB_MISMATCH) {
25022560 *sensedir = splice_sensedir_3_mm;
25112569 Splicetype_string(splicetype5_mm),splice_genomepos_5_mm-chroffset,max_prob_5_mm));
25122570 if (substring1->plusp) {
25132571 *ambig_end_length_5 = splice_genomepos_5_mm - substring1->genomicstart;
2572 debug13(printf("7 Set ambig_end_length_5 to be %d = %u - %u\n",*ambig_end_length_5,splice_genomepos_5_mm,substring1->genomicstart));
25142573 } else {
25152574 *ambig_end_length_5 = substring1->genomicstart - splice_genomepos_5_mm;
2575 debug13(printf("8 Set ambig_end_length_5 to be %d = %u - %u\n",*ambig_end_length_5,substring1->genomicstart,splice_genomepos_5_mm));
25162576 }
25172577 *ambig_splicetype_5 = splicetype5_mm;
25182578 *ambig_prob_5 = max_prob_5_mm;
25192579 /* *cdna_direction = splice_cdna_direction_5_mm; */
2520 debug13(printf("Set ambig_end_length_5 to be %d\n",*ambig_end_length_5));
25212580 if (max_prob_sense_forward_5_mm >= END_SPLICESITE_PROB_MISMATCH && max_prob_sense_anti_5_mm < END_SPLICESITE_PROB_MISMATCH
25222581 && max_prob_sense_anti_3_mm < END_SPLICESITE_PROB_MISMATCH) {
25232582 *sensedir = splice_sensedir_5_mm;
25312590 }
25322591 }
25332592
2534 debug13(printf("Returning ambig_end_length_5 %d and ambig_end_length_3 %d\n",*ambig_end_length_5,*ambig_end_length_3));
2593 debug13(printf("Returning ambig_end_length_5 %d and ambig_end_length_3 %d, probs %f and %f\n",
2594 *ambig_end_length_5,*ambig_end_length_3,*ambig_prob_5,*ambig_prob_3));
25352595 return;
25362596 }
25372597
25492609 int outofbounds_start, int outofbounds_end, int minlength, int sensedir) {
25502610 T new;
25512611 int nmatches;
2552 int nonterminal_trim = 0;
2612 /* int nonterminal_trim = 0; */
25532613
25542614 int ambig_end_length_5, ambig_end_length_3;
25552615 Splicetype_T ambig_splicetype_5, ambig_splicetype_3;
25562616 double ambig_prob_5, ambig_prob_3;
25572617 int nmismatches_end_left, nmismatches_end_right;
2618 int trim;
25582619
25592620
25602621 /* General test for goodness over original region */
25992660 new->plusp = plusp;
26002661 new->genestrand = genestrand;
26012662
2602 new->chimera_prob = 0.0;
2603 new->chimera_knownp = false;
2604 new->chimera_knownp_2 = false;
2605 new->chimera_novelp = false;
2606 new->chimera_novelp_2 = false;
2663 new->splicecoord_D = new->splicecoord_A = new->splicecoord_N = 0;
2664 new->siteD_pos = new->siteA_pos = new->siteN_pos = 0;
2665
2666 new->siteD_prob = new->siteA_prob = 0.0;
2667 new->siteD_knownp = new->siteA_knownp = false;
2668 new->siteD_novelp = new->siteA_novelp = false;
26072669
26082670 debug2(printf("\n***Entered Substring_new with query %d..%d, chrnum %d (chroffset %u, chrhigh %u), plusp %d, outofbounds start %d and end %d\n",
26092671 querystart,queryend,chrnum,chroffset,chrhigh,plusp,outofbounds_start,outofbounds_end));
26172679 new->genomicend = new->left + querylength;
26182680
26192681 debug2(printf("left is %u\n",new->left));
2682 debug2(printf("genomicstart is %u, genomicend is %u\n",new->genomicstart,new->genomicend));
26202683 debug2(printf("querylength is %d, alignstart is %u, alignend is %u\n",querylength,alignstart,alignend));
26212684 assert(alignstart + outofbounds_start >= chroffset);
26222685 assert(alignend - outofbounds_end <= chrhigh);
26272690 new->genomicstart = new->left + querylength;
26282691
26292692 debug2(printf("left is %u\n",new->left));
2693 debug2(printf("genomicstart is %u, genomicend is %u\n",new->genomicstart,new->genomicend));
26302694 debug2(printf("querylength is %d, alignstart is %u, alignend is %u\n",querylength,alignstart,alignend));
26312695 assert(alignstart - outofbounds_start <= chrhigh);
26322696 assert(alignend + outofbounds_end >= chroffset);
26542718 new->genomic_refdiff = (char *) NULL;
26552719
26562720 /* Do trimming */
2657 debug8(printf("trim_left_action %d, trim_right_action %d\n",trim_left_action,trim_right_action));
2721 debug8(printf("trim_left_action %s, trim_right_action %s\n",
2722 Trimaction_string(trim_left_action),Trimaction_string(trim_right_action)));
26582723
26592724 new->mandatory_trim_left = 0;
26602725 new->mandatory_trim_right = 0;
26692734 Substring_free(&new);
26702735 return (T) NULL;
26712736 } else {
2672 new->trim_left = 0;
2737 new->trim_left = querystart;
26732738 }
26742739
26752740 } else if (new->start_endtype == TERM) {
26762741 /* Accept true terminals generated by GSNAP procedure */
2677 new->trim_left = trim_left_end(&nmismatches_end_left,query_compress,new->left,querystart,queryend,querylength,
2678 plusp,genestrand,/*trim_mismatch_score*/-3);
2679 if (outofbounds_start > new->trim_left) {
2680 new->trim_left = outofbounds_start;
2681 }
2682 new->querystart += new->trim_left;
2683
2684 } else {
2685 new->trim_left = trim_left_end(&nmismatches_end_left,query_compress,new->left,querystart,queryend,querylength,
2686 plusp,genestrand,trim_mismatch_score);
2687 debug13(printf("trim_left %d, nmismatches_end_left = %d\n",new->trim_left,nmismatches_end_left));
2688 if (outofbounds_start > new->trim_left) {
2689 new->trim_left = outofbounds_start;
2690 }
2691 nonterminal_trim += new->trim_left;
2692 new->querystart += new->trim_left;
2742 trim = trim_left_end(&nmismatches_end_left,query_compress,new->left,querystart,queryend,querylength,
2743 plusp,genestrand,/*trim_mismatch_score*/-3);
2744 debug8(printf("trim_left_end: trim_left +%d from querystart %d, nmismatches_end_left = %d\n",
2745 trim,querystart,nmismatches_end_left));
2746 if (outofbounds_start > querystart + trim) {
2747 trim = outofbounds_start - querystart;
2748 }
2749 new->querystart += trim;
2750 new->trim_left = new->querystart;
2751 debug8(printf("querystart is now %d\n",new->querystart));
2752
2753 } else {
2754 trim = trim_left_end(&nmismatches_end_left,query_compress,new->left,querystart,queryend,querylength,
2755 plusp,genestrand,trim_mismatch_score);
2756 debug8(printf("trim_left_end: trim_left +%d from querystart %d, nmismatches_end_left = %d\n",
2757 trim,querystart,nmismatches_end_left));
2758 if (outofbounds_start > querystart + trim) {
2759 trim = outofbounds_start - querystart;
2760 }
2761 /* nonterminal_trim += new->trim_left; */
2762 new->querystart += trim;
2763 new->trim_left = new->querystart;
2764 debug8(printf("querystart is now %d\n",new->querystart));
26932765 }
26942766
26952767 if (trim_right_action == PRE_TRIMMED) {
27022774 debug2(printf("outofbounds_end %d > 0, so returning NULL\n",outofbounds_end));
27032775 return (T) NULL;
27042776 } else {
2705 new->trim_right = 0;
2777 new->trim_right = querylength - queryend;
27062778 }
27072779
27082780 } else if (new->end_endtype == TERM) {
27092781 /* Accept true terminals generated by GSNAP procedure */
2710 new->trim_right = trim_right_end(&nmismatches_end_right,query_compress,new->left,querystart,queryend,querylength,
2711 plusp,genestrand,/*trim_mismatch_score*/-3);
2712 if (outofbounds_end > new->trim_right) {
2713 new->trim_right = outofbounds_end;
2714 }
2715 new->queryend -= new->trim_right;
2716
2717 } else {
2718 new->trim_right = trim_right_end(&nmismatches_end_right,query_compress,new->left,querystart,queryend,querylength,
2719 plusp,genestrand,trim_mismatch_score);
2720 debug13(printf("trim_right %d, nmismatches_end_right = %d\n",new->trim_right,nmismatches_end_right));
2721 if (outofbounds_end > new->trim_right) {
2722 new->trim_right = outofbounds_end;
2723 }
2724 nonterminal_trim += new->trim_right;
2725 new->queryend -= new->trim_right;
2782 trim = trim_right_end(&nmismatches_end_right,query_compress,new->left,querystart,queryend,querylength,
2783 plusp,genestrand,/*trim_mismatch_score*/-3);
2784 debug8(printf("trim_right_end: trim_right +%d from queryend %d, nmismatches_end_right = %d\n",
2785 trim,queryend,nmismatches_end_right));
2786 if (outofbounds_end > queryend - trim) {
2787 trim = queryend - outofbounds_end;
2788 }
2789 new->queryend -= trim;
2790 new->trim_right = querylength - new->queryend;
2791 debug8(printf("queryend is now %d\n",new->queryend));
2792
2793 } else {
2794 trim = trim_right_end(&nmismatches_end_right,query_compress,new->left,querystart,queryend,querylength,
2795 plusp,genestrand,trim_mismatch_score);
2796 debug8(printf("trim_right_end: trim_right +%d from queryend %d, nmismatches_end_right = %d\n",
2797 trim,queryend,nmismatches_end_right));
2798 if (outofbounds_end > queryend - trim) {
2799 trim = queryend - outofbounds_end;
2800 }
2801 /* nonterminal_trim += new->trim_right; */
2802 new->queryend -= trim;
2803 new->trim_right = querylength - new->queryend;
2804 debug8(printf("queryend is now %d\n",new->queryend));
27262805 }
27272806
27282807 #if 0
27572836 new->nmatches = (new->alignend - new->alignstart) - new->nmismatches_whole;
27582837
27592838 if (trim_left_action == COMPUTE_TRIM) {
2760 if (nmismatches_end_left == 0) {
2839 if (querystart == 0 && nmismatches_end_left == 0) {
2840 debug8(printf("querystart is 0 and nmismatches_end_left is 0, so setting left_end_action to be NO_TRIM\n"));
27612841 trim_left_action = NO_TRIM;
27622842 } else {
2763 new->alignstart_trim += new->trim_left;
2843 new->alignstart_trim = new->genomicstart + new->trim_left;
27642844 }
27652845 }
27662846 if (trim_right_action == COMPUTE_TRIM) {
2767 if (nmismatches_end_right == 0) {
2847 if (queryend == querylength && nmismatches_end_right == 0) {
2848 debug8(printf("queryend is querylength and nmismatches_end_right is 0, so setting right_end_action to be NO_TRIM\n"));
27682849 trim_right_action = NO_TRIM;
27692850 } else {
2770 new->alignend_trim -= new->trim_right;
2851 new->alignend_trim = new->genomicend - new->trim_right;
27712852 }
27722853 }
27732854 debug2(printf("Got trims of %d and %d => Revised alignstart_trim and alignend_trim to be %u..%u (%u..%u)\n",
27742855 new->trim_left,new->trim_right,new->alignstart_trim,new->alignend_trim,
27752856 new->alignstart_trim - new->chroffset,new->alignend_trim - new->chroffset));
2857 debug2(printf("genomicstart is %u, genomicend is %u\n",new->genomicstart,new->genomicend));
27762858
27772859 new->trim_left_splicep = new->trim_right_splicep = false;
27782860 if (novelsplicingp == true) {
27822864 &sensedir,chroffset);
27832865 if (ambig_end_length_5 > 0) {
27842866 new->trim_left_splicep = true;
2785 new->querystart += (ambig_end_length_5 - new->trim_left);
2786 new->alignstart_trim += (ambig_end_length_5 - new->trim_left);
2867 /* new->querystart += (ambig_end_length_5 - new->trim_left); */
2868 /* new->alignstart_trim += (ambig_end_length_5 - new->trim_left); */
2869 new->querystart = ambig_end_length_5;
2870 new->alignstart_trim = new->genomicstart + ambig_end_length_5;
2871
27872872 new->trim_left = ambig_end_length_5;
27882873 if (ambig_splicetype_5 == DONOR || ambig_splicetype_5 == ANTIDONOR) {
27892874 new->start_endtype = DON;
2875 new->siteD_prob = ambig_prob_5;
27902876 } else {
27912877 new->start_endtype = ACC;
2878 new->siteA_prob = ambig_prob_5;
27922879 }
2793 new->chimera_prob = ambig_prob_5;
27942880 }
27952881 if (ambig_end_length_3 > 0) {
27962882 new->trim_right_splicep = true;
2797 new->queryend -= (ambig_end_length_3 - new->trim_right);
2798 new->alignend_trim -= (ambig_end_length_3 - new->trim_right);
2883 /* new->queryend -= (ambig_end_length_3 - new->trim_right); */
2884 /* new->alignend_trim -= (ambig_end_length_3 - new->trim_right); */
2885 new->queryend = querylength - ambig_end_length_3;
2886 new->alignend_trim = new->genomicend - ambig_end_length_3;
2887
27992888 new->trim_right = ambig_end_length_3;
28002889 if (ambig_splicetype_3 == DONOR || ambig_splicetype_3 == ANTIDONOR) {
28012890 new->end_endtype = DON;
2891 new->siteD_prob = ambig_prob_3;
28022892 } else {
28032893 new->end_endtype = ACC;
2894 new->siteA_prob = ambig_prob_3;
28042895 }
2805 new->chimera_prob_2 = ambig_prob_3;
28062896 }
28072897
28082898 } else if (trim_left_action == COMPUTE_TRIM) {
28112901 &sensedir,chroffset);
28122902 if (ambig_end_length_5 > 0) {
28132903 new->trim_left_splicep = true;
2814 new->querystart += (ambig_end_length_5 - new->trim_left);
2815 new->alignstart_trim += (ambig_end_length_5 - new->trim_left);
2904 /* new->querystart += (ambig_end_length_5 - new->trim_left); */
2905 /* new->alignstart_trim += (ambig_end_length_5 - new->trim_left); */
2906 new->querystart = ambig_end_length_5;
2907 new->alignstart_trim = new->genomicstart + ambig_end_length_5;
2908
28162909 new->trim_left = ambig_end_length_5;
28172910 if (ambig_splicetype_5 == DONOR || ambig_splicetype_5 == ANTIDONOR) {
28182911 new->start_endtype = DON;
2912 new->siteD_prob = ambig_prob_5;
28192913 } else {
28202914 new->start_endtype = ACC;
2915 new->siteA_prob = ambig_prob_5;
28212916 }
2822 new->chimera_prob = ambig_prob_5;
28232917 }
28242918
28252919 } else if (trim_right_action == COMPUTE_TRIM) {
28282922 &sensedir,chroffset);
28292923 if (ambig_end_length_3 > 0) {
28302924 new->trim_right_splicep = true;
2831 new->queryend -= (ambig_end_length_3 - new->trim_right);
2832 new->alignend_trim -= (ambig_end_length_3 - new->trim_right);
2925 /* new->queryend -= (ambig_end_length_3 - new->trim_right); */
2926 /* new->alignend_trim -= (ambig_end_length_3 - new->trim_right); */
2927 new->queryend = querylength - ambig_end_length_3;
2928 new->alignend_trim = new->genomicend - ambig_end_length_3;
2929
28332930 new->trim_right = ambig_end_length_3;
28342931 if (ambig_splicetype_3 == DONOR || ambig_splicetype_3 == ANTIDONOR) {
28352932 new->end_endtype = DON;
2933 new->siteD_prob = ambig_prob_3;
28362934 } else {
28372935 new->end_endtype = ACC;
2936 new->siteA_prob = ambig_prob_3;
28382937 }
2839 new->chimera_prob_2 = ambig_prob_3;
28402938 }
28412939 }
28422940 }
28462944 new->nmatches = (new->alignstart - new->alignend) - new->nmismatches_whole;
28472945
28482946 if (trim_left_action == COMPUTE_TRIM) {
2849 if (nmismatches_end_left == 0) {
2947 if (querystart == 0 && nmismatches_end_left == 0) {
2948 debug8(printf("querystart is 0 and nmismatches_end_left is 0, so setting left_end_action to be NO_TRIM\n"));
28502949 trim_left_action = NO_TRIM;
28512950 } else {
2852 new->alignstart_trim -= new->trim_left;
2951 new->alignstart_trim = new->genomicstart - new->trim_left;
28532952 }
28542953 }
28552954 if (trim_right_action == COMPUTE_TRIM) {
2856 if (nmismatches_end_right == 0) {
2955 if (queryend == querylength && nmismatches_end_right == 0) {
2956 debug8(printf("queryend is querylength and nmismatches_end_right is 0, so setting right_end_action to be NO_TRIM\n"));
28572957 trim_right_action = NO_TRIM;
28582958 } else {
2859 new->alignend_trim += new->trim_right;
2959 new->alignend_trim = new->genomicend + new->trim_right;
28602960 }
28612961 }
28622962 debug2(printf("Revised alignstart_trim and alignend_trim to be %u..%u (%u..%u)\n",
28712971 &sensedir,chroffset);
28722972 if (ambig_end_length_5 > 0) {
28732973 new->trim_left_splicep = true;
2874 new->querystart += (ambig_end_length_5 - new->trim_left);
2875 new->alignstart_trim -= (ambig_end_length_5 - new->trim_left);
2974 /* new->querystart += (ambig_end_length_5 - new->trim_left); */
2975 /* new->alignstart_trim -= (ambig_end_length_5 - new->trim_left); */
2976 new->querystart = ambig_end_length_5;
2977 new->alignstart_trim = new->genomicstart - ambig_end_length_5;
2978
28762979 new->trim_left = ambig_end_length_5;
28772980 if (ambig_splicetype_5 == DONOR || ambig_splicetype_5 == ANTIDONOR) {
28782981 new->start_endtype = DON;
2982 new->siteD_prob = ambig_prob_5;
28792983 } else {
28802984 new->start_endtype = ACC;
2985 new->siteA_prob = ambig_prob_5;
28812986 }
2882 new->chimera_prob = ambig_prob_5;
28832987 }
28842988 if (ambig_end_length_3 > 0) {
28852989 new->trim_right_splicep = true;
2886 new->queryend -= (ambig_end_length_3 - new->trim_right);
2887 new->alignend_trim += (ambig_end_length_3 - new->trim_right);
2990 /* new->queryend -= (ambig_end_length_3 - new->trim_right); */
2991 /* new->alignend_trim += (ambig_end_length_3 - new->trim_right); */
2992 new->queryend = querylength - ambig_end_length_3;
2993 new->alignend_trim = new->genomicend + ambig_end_length_3;
2994
28882995 new->trim_right = ambig_end_length_3;
28892996 if (ambig_splicetype_3 == DONOR || ambig_splicetype_3 == ANTIDONOR) {
28902997 new->end_endtype = DON;
2998 new->siteD_prob = ambig_prob_3;
28912999 } else {
28923000 new->end_endtype = ACC;
3001 new->siteA_prob = ambig_prob_3;
28933002 }
2894 new->chimera_prob_2 = ambig_prob_3;
28953003 }
28963004
28973005 } else if (trim_left_action == COMPUTE_TRIM) {
29003008 &sensedir,chroffset);
29013009 if (ambig_end_length_5 > 0) {
29023010 new->trim_left_splicep = true;
2903 new->querystart += (ambig_end_length_5 - new->trim_left);
2904 new->alignstart_trim -= (ambig_end_length_5 - new->trim_left);
3011 /* new->querystart += (ambig_end_length_5 - new->trim_left); */
3012 /* new->alignstart_trim -= (ambig_end_length_5 - new->trim_left); */
3013 new->querystart = ambig_end_length_5;
3014 new->alignstart_trim = new->genomicstart - ambig_end_length_5;
3015
29053016 new->trim_left = ambig_end_length_5;
29063017 if (ambig_splicetype_5 == DONOR || ambig_splicetype_5 == ANTIDONOR) {
29073018 new->start_endtype = DON;
3019 new->siteD_prob = ambig_prob_5;
29083020 } else {
29093021 new->start_endtype = ACC;
3022 new->siteA_prob = ambig_prob_5;
29103023 }
2911 new->chimera_prob = ambig_prob_5;
29123024 }
29133025
29143026 } else if (trim_right_action == COMPUTE_TRIM) {
29173029 &sensedir,chroffset);
29183030 if (ambig_end_length_3 > 0) {
29193031 new->trim_right_splicep = true;
2920 new->queryend -= (ambig_end_length_3 - new->trim_right);
2921 new->alignend_trim += (ambig_end_length_3 - new->trim_right);
3032 /* new->queryend -= (ambig_end_length_3 - new->trim_right); */
3033 /* new->alignend_trim += (ambig_end_length_3 - new->trim_right); */
3034 new->queryend = querylength - ambig_end_length_3;
3035 new->alignend_trim = new->genomicend + ambig_end_length_3;
3036
29223037 new->trim_right = ambig_end_length_3;
29233038 if (ambig_splicetype_3 == DONOR || ambig_splicetype_3 == ANTIDONOR) {
29243039 new->end_endtype = DON;
3040 new->siteD_prob = ambig_prob_3;
29253041 } else {
29263042 new->end_endtype = ACC;
3043 new->siteA_prob = ambig_prob_3;
29273044 }
2928 new->chimera_prob_2 = ambig_prob_3;
29293045 }
29303046 }
29313047 }
29633079 new->amb_knowni = (int *) NULL;
29643080 new->amb_nmismatches = (int *) NULL;
29653081 new->amb_probs = (double *) NULL;
2966 new->amb_common_prob = 0.0;
2967 new->amb_donor_common_p = false;
2968
2969 debug2(printf("Returning substring %p\n",new));
3082 new->amb_type = END;
3083
3084 debug2(printf("Returning substring %p, query %d..%d, trim %d..%d\n",
3085 new,new->querystart,new->queryend,new->trim_left,new->trim_right));
29703086 return new;
29713087 }
29723088
29733089
29743090 T
2975 Substring_new_ambig (int querystart, int queryend, int splice_pos, int querylength,
2976 Chrnum_T chrnum, Univcoord_T chroffset,
2977 Univcoord_T chrhigh, Chrpos_T chrlength,
2978 bool plusp, int genestrand,
3091 Substring_new_ambig_D (int querystart, int queryend, int splice_pos, int querylength,
3092 Chrnum_T chrnum, Univcoord_T chroffset,
3093 Univcoord_T chrhigh, Chrpos_T chrlength,
3094 bool plusp, int genestrand,
29793095 #ifdef LARGE_GENOMES
2980 Uint8list_T ambcoords,
3096 Uint8list_T ambcoords,
29813097 #else
2982 Uintlist_T ambcoords,
2983 #endif
2984 Intlist_T amb_knowni, Intlist_T amb_nmismatches, Doublelist_T amb_probs,
2985 double amb_common_prob, bool amb_donor_common_p, bool substring1p) {
3098 Uintlist_T ambcoords,
3099 #endif
3100 Intlist_T amb_knowni, Intlist_T amb_nmismatches, Doublelist_T amb_probs,
3101 double amb_common_prob, bool substring1p) {
29863102 int ignore;
29873103 T new = (T) MALLOC_OUT(sizeof(*new));
29883104
2989 debug2(printf("Entered Substring_new_ambig with chrnum %d (chroffset %u, chrhigh %u), %d..%d, querylength %d, plusp %d\n",
3105 debug2(printf("Entered Substring_new_ambig_D with chrnum %d (chroffset %u, chrhigh %u), %d..%d, querylength %d, plusp %d\n",
29903106 chrnum,chroffset,chrhigh,querystart,queryend,querylength,plusp));
29913107
29923108 new->exactp = false;
30193135
30203136 new->querystart_orig = new->querystart = querystart;
30213137 new->queryend_orig = new->queryend = queryend;
3022 new->amb_splice_pos = splice_pos;
30233138 new->querylength = querylength;
30243139
30253140 new->alignstart = new->alignstart_trim = 0;
30283143 new->plusp = plusp;
30293144 new->genestrand = genestrand;
30303145
3031 new->chimera_prob = 0.0;
3032 new->chimera_knownp = false;
3033 new->chimera_knownp_2 = false;
3034 new->chimera_novelp = false;
3035 new->chimera_novelp_2 = false;
3146 new->siteD_knownp = new->siteA_knownp = false;
3147 new->siteD_novelp = new->siteA_novelp = false;
3148
3149 new->siteD_prob = 0.0;
3150 new->siteA_prob = amb_common_prob;
30363151
30373152 new->nmismatches_bothdiff = new->nmismatches_whole = Intlist_min(amb_nmismatches);
30383153
30743189 new->amb_knowni = Intlist_to_array_out(&ignore,amb_knowni);
30753190 new->amb_nmismatches = Intlist_to_array_out(&ignore,amb_nmismatches);
30763191 new->amb_probs = Doublelist_to_array_out(&ignore,amb_probs);
3077 new->amb_common_prob = amb_common_prob;
3078 new->amb_donor_common_p = amb_donor_common_p;
3192 new->amb_splice_pos = splice_pos;
3193 new->amb_type = DON;
3194
3195 return new;
3196 }
3197
3198 T
3199 Substring_new_ambig_A (int querystart, int queryend, int splice_pos, int querylength,
3200 Chrnum_T chrnum, Univcoord_T chroffset,
3201 Univcoord_T chrhigh, Chrpos_T chrlength,
3202 bool plusp, int genestrand,
3203 #ifdef LARGE_GENOMES
3204 Uint8list_T ambcoords,
3205 #else
3206 Uintlist_T ambcoords,
3207 #endif
3208 Intlist_T amb_knowni, Intlist_T amb_nmismatches, Doublelist_T amb_probs,
3209 double amb_common_prob, bool substring1p) {
3210 int ignore;
3211 T new = (T) MALLOC_OUT(sizeof(*new));
3212
3213 debug2(printf("Entered Substring_new_ambig_A with chrnum %d (chroffset %u, chrhigh %u), %d..%d, querylength %d, plusp %d\n",
3214 chrnum,chroffset,chrhigh,querystart,queryend,querylength,plusp));
3215
3216 new->exactp = false;
3217
3218 new->chrnum = chrnum;
3219 new->chroffset = chroffset;
3220 new->chrhigh = chrhigh;
3221 new->chrlength = chrlength;
3222
3223 new->left = 0;
3224 #ifdef LARGE_GENOMES
3225 if (plusp == true) {
3226 new->genomicstart = Uint8list_max(ambcoords);
3227 new->genomicend = Uint8list_min(ambcoords);
3228 } else {
3229 new->genomicstart = Uint8list_min(ambcoords);
3230 new->genomicend = Uint8list_max(ambcoords);
3231 }
3232 #else
3233 if (plusp == true) {
3234 new->genomicstart = Uintlist_max(ambcoords);
3235 new->genomicend = Uintlist_min(ambcoords);
3236 } else {
3237 new->genomicstart = Uintlist_min(ambcoords);
3238 new->genomicend = Uintlist_max(ambcoords);
3239 }
3240 #endif
3241 new->start_endtype = END;
3242 new->end_endtype = END;
3243
3244 new->querystart_orig = new->querystart = querystart;
3245 new->queryend_orig = new->queryend = queryend;
3246 new->querylength = querylength;
3247
3248 new->alignstart = new->alignstart_trim = 0;
3249 new->alignend = new->alignend_trim = 0;
3250
3251 new->plusp = plusp;
3252 new->genestrand = genestrand;
3253
3254 new->siteD_knownp = new->siteA_knownp = false;
3255 new->siteD_novelp = new->siteA_novelp = false;
3256
3257 new->siteA_prob = 0.0;
3258 new->siteD_prob = amb_common_prob;
3259
3260 new->nmismatches_bothdiff = new->nmismatches_whole = Intlist_min(amb_nmismatches);
3261
3262 #if 0
3263 if (plusp == true) {
3264 /* Fails because alignstart and alignend are not known */
3265 new->nmatches = (new->alignend_trim - new->alignstart_trim) - new->nmismatches_whole;
3266 } else {
3267 new->alignoffset = querylength - queryend;
3268 /* Fails because alignstart and alignend are not known */
3269 new->nmatches = (new->alignstart_trim - new->alignend_trim) - new->nmismatches_whole;
3270 }
3271 #endif
3272 new->nmatches = (queryend - querystart) - new->nmismatches_whole;
3273
3274 new->genomic_bothdiff = (char *) NULL;
3275 new->genomic_refdiff = (char *) NULL;
3276 if (substring1p == true) {
3277 debug2(printf("substring1p is true, so setting trims to be %d and %d\n",querystart,0));
3278 new->trim_left = querystart;
3279 new->trim_right = 0;
3280 } else {
3281 debug2(printf("substring1p is false, so setting trims to be %d and %d\n",0,querylength - queryend));
3282 new->trim_left = 0;
3283 new->trim_right = querylength - queryend;
3284 }
3285 new->mandatory_trim_left = 0;
3286 new->mandatory_trim_right = 0;
3287 new->trim_left_splicep = new->trim_right_splicep = false;
3288
3289
3290 new->ambiguous_p = true;
3291 #ifdef LARGE_GENOMES
3292 new->ambcoords = Uint8list_to_array_out(&new->nambcoords,ambcoords);
3293 #else
3294 new->ambcoords = Uintlist_to_array_out(&new->nambcoords,ambcoords);
3295 debug2(printf("ambcoords: %s\n",Uintlist_to_string(ambcoords)));
3296 #endif
3297 new->amb_knowni = Intlist_to_array_out(&ignore,amb_knowni);
3298 new->amb_nmismatches = Intlist_to_array_out(&ignore,amb_nmismatches);
3299 new->amb_probs = Doublelist_to_array_out(&ignore,amb_probs);
3300 new->amb_splice_pos = splice_pos;
3301 new->amb_type = ACC;
30793302
30803303 return new;
30813304 }
31083331 Substring_set_unambiguous (double *donor_prob, double *acceptor_prob, Univcoord_T *genomicstart, Univcoord_T *genomicend,
31093332 T this, int bingoi) {
31103333
3111 debug2(printf("Entered Substring_set_unambiguous\n"));
3112
3113 this->splicecoord = this->ambcoords[bingoi];
3114 this->splicesites_knowni = this->amb_knowni[bingoi];
3334 #ifdef DEBUG2
3335 printf("Entered Substring_set_unambiguous. plusp %d",this->plusp);
3336 if (this->amb_type == DON) {
3337 printf("type DON\n");
3338 } else {
3339 printf("type ACC\n");
3340 }
3341 #endif
3342
31153343 this->nmismatches_whole = this->amb_nmismatches[bingoi];
3116 this->chimera_prob = this->amb_probs[bingoi];
31173344
31183345 if (this->plusp == true) {
3119 this->left = this->splicecoord - this->amb_splice_pos;
3346 if (this->amb_type == DON) {
3347 *acceptor_prob = this->siteA_prob;
3348 *donor_prob = this->siteD_prob = this->amb_probs[bingoi];
3349 this->splicecoord_D = this->ambcoords[bingoi];
3350 this->splicesitesD_knowni = this->amb_knowni[bingoi];
3351 this->left = this->splicecoord_D - this->amb_splice_pos;
3352 } else {
3353 *donor_prob = this->siteD_prob;
3354 *acceptor_prob = this->siteA_prob = this->amb_probs[bingoi];
3355 this->splicecoord_A = this->ambcoords[bingoi];
3356 this->splicesitesA_knowni = this->amb_knowni[bingoi];
3357 this->left = this->splicecoord_A - this->amb_splice_pos;
3358 }
3359
31203360 debug2(printf("left %u\n",this->left));
31213361 *genomicstart = this->genomicstart = this->left;
31223362 *genomicend = this->genomicend = this->left + this->querylength;
31293369 this->alignend,this->alignend - this->chroffset,this->genomicstart,this->genomicend));
31303370
31313371 } else {
3132 this->left = this->splicecoord - (this->querylength - this->amb_splice_pos);
3372 if (this->amb_type == DON) {
3373 *acceptor_prob = this->siteA_prob;
3374 *donor_prob = this->siteD_prob = this->amb_probs[bingoi];
3375 this->splicecoord_D = this->ambcoords[bingoi];
3376 this->splicesitesD_knowni = this->amb_knowni[bingoi];
3377 this->left = this->splicecoord_D - (this->querylength - this->amb_splice_pos);
3378 } else {
3379 *donor_prob = this->siteD_prob;
3380 *acceptor_prob = this->siteA_prob = this->amb_probs[bingoi];
3381 this->splicecoord_A = this->ambcoords[bingoi];
3382 this->splicesitesA_knowni = this->amb_knowni[bingoi];
3383 this->left = this->splicecoord_A - (this->querylength - this->amb_splice_pos);
3384 }
3385
31333386 debug2(printf("left %u\n",this->left));
31343387 *genomicend = this->genomicend = this->left;
31353388 *genomicstart = this->genomicstart = this->left + this->querylength;
31403393 debug2(printf("querypos %d..%d, alignstart is %u (%u), alignend is %u (%u), genomicstart is %u, genomicend is %u\n",
31413394 this->querystart,this->queryend,this->alignstart,this->alignstart - this->chroffset,
31423395 this->alignend,this->alignend - this->chroffset,this->genomicstart,this->genomicend));
3143 }
3144
3145 if (this->amb_donor_common_p == true) {
3146 *donor_prob = this->amb_common_prob;
3147 *acceptor_prob = this->amb_probs[bingoi];
3148 } else {
3149 *acceptor_prob = this->amb_common_prob;
3150 *donor_prob = this->amb_probs[bingoi];
31513396 }
31523397
31533398 this->ambiguous_p = false;
32363481 int extraleft, int extraright,
32373482 Compress_T query_compress_fwd, Compress_T query_compress_rev,
32383483 Genome_T genome) {
3239 char *genomic_diff;
3240 char *gbuffer;
3241 #ifndef HAVE_ALLOCA
3484
3485 #if defined(LONG_READLENGTHS)
3486 char *genomic_diff, *gbuffer;
3487 #elif defined(HAVE_ALLOCA)
3488 char *genomic_diff, *gbuffer;
3489 #else
3490 char *genomic_diff, *gbuffer;
32423491 char gbuffer_alloc[MAX_READLENGTH/*+MAX_END_DELETIONS*/+1];
32433492 bool allocp;
32443493 #endif
32583507
32593508 } else {
32603509 /* Used to be this->genomiclength, but doesn't work for large insertions */
3261 #ifdef HAVE_ALLOCA
3510 #if defined(LONG_READLENGTHS)
3511 gbuffer = (char *) MALLOC((querylength+1) * sizeof(char));
3512 #elif defined(HAVE_ALLOCA)
32623513 gbuffer = (char *) ALLOCA((querylength+1) * sizeof(char));
32633514 #else
32643515 if (querylength < MAX_READLENGTH) {
33113562
33123563 if (0 && this->exactp == true && extraleft == 0 && extraright == 0) {
33133564 } else {
3314 #ifdef HAVE_ALLOCA
3565 #if defined(LONG_READLENGTHS)
3566 FREE(gbuffer);
3567 #elif defined(HAVE_ALLOCA)
33153568 FREEA(gbuffer);
33163569 #else
33173570 if (allocp == true) {
33293582
33303583 } else {
33313584 /* Used to be this->genomiclength, but doesn't work for large insertions */
3332 #ifdef HAVE_ALLOCA
3585 #if defined(LONG_READLENGTHS)
3586 gbuffer = (char *) MALLOC((querylength+1) * sizeof(char));
3587 #elif defined(HAVE_ALLOCA)
33333588 gbuffer = (char *) ALLOCA((querylength+1) * sizeof(char));
33343589 #else
33353590 if (querylength < MAX_READLENGTH) {
33853640
33863641 if (0 && this->exactp == true && extraleft == 0 && extraright == 0) {
33873642 } else {
3388 #ifdef HAVE_ALLOCA
3643 #if defined(LONG_READLENGTHS)
3644 FREE(gbuffer);
3645 #elif defined(HAVE_ALLOCA)
33893646 FREEA(gbuffer);
33903647 #else
33913648 if (allocp == true) {
34043661 return this->left;
34053662 }
34063663
3664
34073665 Univcoord_T
3408 Substring_splicecoord (T this) {
3409 return this->splicecoord;
3410 }
3411
3412 Chrpos_T
3413 Substring_chr_splicecoord (T this) {
3414 return (Chrpos_T) (this->splicecoord - this->chroffset);
3415 }
3416
3417 int
3418 Substring_splicesites_knowni (T this) {
3419 return this->splicesites_knowni;
3666 Substring_splicecoord_D (T this) {
3667 return this->splicecoord_D;
34203668 }
34213669
34223670 Univcoord_T
34233671 Substring_splicecoord_A (T this) {
3424 return this->splicecoord;
3425 }
3426
3427 Univcoord_T
3428 Substring_splicecoord_D (T this) {
3429 return this->splicecoord_2;
3672 return this->splicecoord_A;
3673 }
3674
3675 Chrpos_T
3676 Substring_chr_splicecoord_D (T this) {
3677 return (Chrpos_T) (this->splicecoord_D - this->chroffset);
3678 }
3679
3680 Chrpos_T
3681 Substring_chr_splicecoord_A (T this) {
3682 return (Chrpos_T) (this->splicecoord_A - this->chroffset);
3683 }
3684
3685 int
3686 Substring_splicesitesD_knowni (T this) {
3687 return this->splicesitesD_knowni;
3688 }
3689
3690 int
3691 Substring_splicesitesA_knowni (T this) {
3692 return this->splicesitesA_knowni;
34303693 }
34313694
34323695 bool
37434006 double max;
37444007 int i;
37454008
3746 if (this->amb_donor_common_p == true) {
3747 return this->amb_common_prob;
4009 if (this->amb_type == DON) {
4010 return this->siteD_prob;
37484011 } else {
37494012 max = this->amb_probs[0];
37504013 for (i = 1; i < this->nambcoords; i++) {
37614024 double max;
37624025 int i;
37634026
3764 if (this->amb_donor_common_p == true) {
4027 if (this->amb_type == ACC) {
4028 return this->siteA_prob;
4029 } else {
37654030 max = this->amb_probs[0];
37664031 for (i = 1; i < this->nambcoords; i++) {
37674032 if (this->amb_probs[i] > max) {
37694034 }
37704035 }
37714036 return max;
3772 } else {
3773 return this->amb_common_prob;
3774 }
3775 }
3776
3777
4037 }
4038 }
4039
4040
4041
4042 double
4043 Substring_siteD_prob (T this) {
4044 return this->siteD_prob;
4045 }
37784046
37794047 double
37804048 Substring_siteA_prob (T this) {
37814049 return this->siteA_prob;
37824050 }
37834051
3784 double
3785 Substring_siteD_prob (T this) {
3786 return this->siteD_prob;
3787 }
3788
3789
3790 double
3791 Substring_chimera_prob (T this) {
3792 return this->chimera_prob;
3793 }
3794
3795 double
3796 Substring_chimera_prob_2 (T this) {
3797 return this->chimera_prob_2;
3798 }
37994052
38004053 int
3801 Substring_chimera_pos (T this) {
3802 return this->chimera_pos;
3803 }
3804
3805 /* For shortexon */
4054 Substring_siteD_pos (T this) {
4055 return this->siteD_pos;
4056 }
4057
38064058 int
3807 Substring_chimera_pos_A (T this) {
3808 return this->chimera_pos;
3809 }
3810
3811 /* For shortexon */
4059 Substring_siteA_pos (T this) {
4060 return this->siteA_pos;
4061 }
4062
38124063 int
3813 Substring_chimera_pos_D (T this) {
3814 return this->chimera_pos_2;
3815 }
3816
3817 bool
3818 Substring_chimera_knownp (T this) {
3819 return this->chimera_knownp;
3820 }
4064 Substring_siteN_pos (T this) {
4065 return this->siteN_pos;
4066 }
4067
38214068
38224069 int
38234070 Substring_nchimera_known (T this) {
38244071 if (this == NULL) {
38254072 return 0;
38264073 } else {
3827 return (int) this->chimera_knownp + (int) this->chimera_knownp_2;
4074 return (int) this->siteD_knownp + (int) this->siteA_knownp;
38284075 }
38294076 }
38304077
38334080 if (this == NULL) {
38344081 return 0;
38354082 } else {
3836 return (int) this->chimera_novelp + (int) this->chimera_novelp_2;
4083 return (int) this->siteD_novelp + (int) this->siteA_novelp;
38374084 }
38384085 }
38394086
38504097 return this->ambiguous_p;
38514098 }
38524099
4100 bool
4101 Substring_list_ambiguous_p (List_T list) {
4102 T this;
4103 List_T p;
4104
4105 for (p = list; p != NULL; p = List_next(p)) {
4106 this = (T) List_head(p);
4107 if (this->ambiguous_p == true) {
4108 return true;
4109 }
4110 }
4111 return false;
4112 }
4113
38534114 int
38544115 Substring_nambcoords (T this) {
38554116 return this->nambcoords;
38744135 Substring_amb_probs (T this) {
38754136 return this->amb_probs;
38764137 }
3877
3878
38794138
38804139
38814140
39794238
39804239 new->chimera_sensedir = old->chimera_sensedir;
39814240
3982 new->splicecoord = old->splicecoord;
3983 new->splicesites_knowni = old->splicesites_knowni;
3984 new->chimera_knownp = old->chimera_knownp;
3985 new->chimera_novelp = old->chimera_novelp;
3986 new->chimera_modelpos = old->chimera_modelpos;
3987 new->chimera_pos = old->chimera_pos;
3988 new->chimera_prob = old->chimera_prob;
3989
3990 new->splicecoord_2 = old->splicecoord_2;
3991 new->splicesites_knowni_2 = old->splicesites_knowni_2;
3992 new->chimera_knownp_2 = old->chimera_knownp_2;
3993 new->chimera_novelp_2 = old->chimera_novelp_2;
3994 new->chimera_modelpos_2 = old->chimera_modelpos_2;
3995 new->chimera_pos_2 = old->chimera_pos_2;
3996 new->chimera_prob_2 = old->chimera_prob_2;
4241 new->splicecoord_D = old->splicecoord_D;
4242 new->splicesitesD_knowni = old->splicesitesD_knowni;
4243 new->siteD_knownp = old->siteD_knownp;
4244 new->siteD_novelp = old->siteD_novelp;
4245 new->siteD_pos = old->siteD_pos;
4246 new->siteD_prob = old->siteD_prob;
4247
4248 new->splicecoord_A = old->splicecoord_A;
4249 new->splicesitesA_knowni = old->splicesitesA_knowni;
4250 new->siteA_knownp = old->siteA_knownp;
4251 new->siteA_novelp = old->siteA_novelp;
4252 new->siteA_pos = old->siteA_pos;
4253 new->siteA_prob = old->siteA_prob;
4254
4255 new->splicecoord_N = old->splicecoord_N;
4256 new->siteN_pos = old->siteN_pos;
39974257
39984258 new->ambiguous_p = old->ambiguous_p;
39994259 if (old->nambcoords == 0) {
40024262 new->amb_knowni = (int *) NULL;
40034263 new->amb_nmismatches = (int *) NULL;
40044264 new->amb_probs = (double *) NULL;
4005 new->amb_common_prob = 0.0;
4006 new->amb_donor_common_p = false;
40074265 } else {
40084266 new->nambcoords = old->nambcoords;
40094267 new->ambcoords = (Univcoord_T *) MALLOC_OUT(old->nambcoords * sizeof(Univcoord_T));
40104268 new->amb_knowni = (int *) MALLOC_OUT(old->nambcoords * sizeof(int));
40114269 new->amb_nmismatches = (int *) MALLOC_OUT(old->nambcoords * sizeof(int));
40124270 new->amb_probs = (double *) MALLOC_OUT(old->nambcoords * sizeof(double));
4013 new->amb_common_prob = old->amb_common_prob;
4014 new->amb_donor_common_p = old->amb_donor_common_p;
40154271
40164272 memcpy(new->ambcoords,old->ambcoords,old->nambcoords * sizeof(Univcoord_T));
40174273 memcpy(new->amb_knowni,old->amb_knowni,old->nambcoords * sizeof(int));
40184274 memcpy(new->amb_nmismatches,old->amb_nmismatches,old->nambcoords * sizeof(int));
40194275 memcpy(new->amb_probs,old->amb_probs,old->nambcoords * sizeof(double));
40204276 }
4277 new->amb_type = old->amb_type;
40214278
40224279 return new;
40234280 }
40834340
40844341 debug2(printf("Making new startfrag with coord %u and left %u, plusp %d, query %d..%d, genome %u..%u\n",
40854342 startfrag_coord,left,plusp,querystart,queryend,alignstart - chroffset,alignend - chroffset));
4086 new->splicecoord = startfrag_coord;
4087 new->splicesites_knowni = -1;
4088
4089 new->chimera_modelpos = left + splice_pos;
4090 assert(new->splicecoord == new->chimera_modelpos);
4343 new->splicecoord_N = startfrag_coord;
4344 assert(startfrag_coord == left + splice_pos);
4345
40914346 new->chimera_sensedir = SENSE_NULL;
4092 /* new->chimera_knownp = false; */
4093 new->chimera_novelp = true;
40944347
40954348 if (plusp == true) {
4096 new->chimera_pos = splice_pos;
4097 } else {
4098 new->chimera_pos = querylength - splice_pos;
4099 }
4100 new->chimera_prob = 0.0;
4101
4102 new->siteA_prob = 0.0;
4103 new->siteD_prob = 0.0;
4349 new->siteN_pos = splice_pos;
4350 } else {
4351 new->siteN_pos = querylength - splice_pos;
4352 }
41044353
41054354 return new;
41064355 }
41654414
41664415 debug2(printf("Making new endfrag with coord %u and left %u, plusp %d, query %d..%d, genome %u..%u\n",
41674416 endfrag_coord,left,plusp,querystart,queryend,alignstart - chroffset,alignend - chroffset));
4168 new->splicecoord = endfrag_coord;
4169 new->splicesites_knowni = -1;
4170
4171 new->chimera_modelpos = left + splice_pos;
4172 assert(new->splicecoord == new->chimera_modelpos);
4417 new->splicecoord_N = endfrag_coord;
4418 assert(endfrag_coord == left + splice_pos);
4419
41734420 new->chimera_sensedir = SENSE_NULL;
4174 /* new->chimera_knownp = false; */
4175 new->chimera_novelp = true;
41764421
41774422 if (plusp == true) {
4178 new->chimera_pos = splice_pos;
4179 } else {
4180 new->chimera_pos = querylength - splice_pos;
4181 }
4182 new->chimera_prob = 0.0;
4183
4184 new->siteA_prob = 0.0;
4185 new->siteD_prob = 0.0;
4423 new->siteN_pos = splice_pos;
4424 } else {
4425 new->siteN_pos = querylength - splice_pos;
4426 }
41864427
41874428 return new;
41884429 }
42194460
42204461 querystart = substring_querystart; /* 0, for an end piece */
42214462 queryend = donor_pos;
4463 #if 0
42224464 if (querystart == 0) {
42234465 trim_left_action = COMPUTE_TRIM; /* querystart == 0 */
42244466 } else {
42254467 trim_left_action = PRE_TRIMMED;
42264468 }
4469 #else
4470 trim_left_action = COMPUTE_TRIM;
4471 #endif
42274472 trim_right_action = NO_TRIM;
42284473
42294474 } else if (sensedir == SENSE_ANTI) {
42334478 querystart = donor_pos;
42344479 queryend = substring_queryend; /* querylength, for an end piece */
42354480 trim_left_action = NO_TRIM;
4481 #if 0
42364482 if (queryend == querylength) {
42374483 trim_right_action = COMPUTE_TRIM; /* queryend == querylength */
42384484 } else {
42394485 trim_right_action = PRE_TRIMMED;
42404486 }
4487 #else
4488 trim_right_action = COMPUTE_TRIM;
4489 #endif
42414490
42424491 } else {
42434492 abort();
42554504
42564505 querystart = substring_querystart; /* 0, for an end piece */
42574506 queryend = querylength - donor_pos;
4507 #if 0
42584508 if (querystart == 0) {
42594509 trim_left_action = COMPUTE_TRIM; /* querystart == 0 */
42604510 } else {
42614511 trim_left_action = PRE_TRIMMED;
42624512 }
4513 #else
4514 trim_left_action = COMPUTE_TRIM;
4515 #endif
42634516 trim_right_action = NO_TRIM;
42644517
42654518 } else if (sensedir == SENSE_ANTI) {
42694522 querystart = querylength - donor_pos;
42704523 queryend = substring_queryend; /* querylength, for an end piece */
42714524 trim_left_action = NO_TRIM;
4525 #if 0
42724526 if (queryend == querylength) {
42734527 trim_right_action = COMPUTE_TRIM; /* queryend == querylength */
42744528 } else {
42754529 trim_right_action = PRE_TRIMMED;
42764530 }
4531 #else
4532 trim_right_action = COMPUTE_TRIM;
4533 #endif
42774534
42784535 } else {
42794536 abort();
42914548 return (T) NULL;
42924549 }
42934550
4294 debug2(printf("Making new donor with splicesites_i %d, coord %u and left %u, plusp %d, sensedir %d, query %d..%d, genome %u..%u\n",
4295 donor_knowni,donor_coord,left,plusp,sensedir,querystart,queryend,alignstart - chroffset,alignend - chroffset));
4296 new->splicecoord = donor_coord;
4297 new->splicesites_knowni = donor_knowni;
4298
4299 new->chimera_modelpos = left + donor_pos;
4300 assert(new->splicecoord == new->chimera_modelpos);
4551 debug2(printf("Making new donor with splicesites_i %d, coord %u and left %u, plusp %d, sensedir %d, query %d..%d, trim %d..%d, genome %u..%u\n",
4552 donor_knowni,donor_coord,left,plusp,sensedir,new->querystart,new->queryend,
4553 new->trim_left,new->trim_right,alignstart - chroffset,alignend - chroffset));
4554 debug2(printf("Original bounds were %d..%d\n",substring_querystart,substring_queryend));
4555 debug2(printf("Setting siteD_prob to be %f\n",donor_prob));
4556
4557 new->splicecoord_D = donor_coord;
4558 new->splicesitesD_knowni = donor_knowni;
4559 assert(donor_coord == left + donor_pos);
4560
43014561 new->chimera_sensedir = sensedir;
43024562 if (donor_knowni >= 0) {
4303 new->chimera_knownp = true;
4563 new->siteD_knownp = true;
43044564 /* new->chimera_novelp = false */
43054565 } else {
4306 /* new->chimera_knownp = false; */
4307 new->chimera_novelp = true;
4566 /* new->siteD_knownp = false; */
4567 new->siteD_novelp = true;
43084568 }
43094569
43104570 if (plusp == true) {
4311 new->chimera_pos = donor_pos;
4312 } else {
4313 new->chimera_pos = querylength - donor_pos;
4314 }
4315 new->chimera_prob = donor_prob;
4316
4317 new->siteA_prob = 0.0;
4571 new->siteD_pos = donor_pos;
4572 } else {
4573 new->siteD_pos = querylength - donor_pos;
4574 }
43184575 new->siteD_prob = donor_prob;
43194576
43204577 return new;
43524609 querystart = acceptor_pos;
43534610 queryend = substring_queryend; /* querylength, for an end piece */
43544611 trim_left_action = NO_TRIM;
4612 #if 0
43554613 if (queryend == querylength) {
43564614 trim_right_action = COMPUTE_TRIM; /* queryend == querylength */
43574615 } else {
43584616 trim_right_action = PRE_TRIMMED;
43594617 }
4618 #else
4619 trim_right_action = COMPUTE_TRIM;
4620 #endif
43604621
43614622 } else if (sensedir == SENSE_ANTI) {
43624623 start_endtype = END;
43644625
43654626 querystart = substring_querystart; /* 0, for an end piece */
43664627 queryend = acceptor_pos;
4628 #if 0
43674629 if (querystart == 0) {
43684630 trim_left_action = COMPUTE_TRIM; /* querystart == 0 */
43694631 } else {
43704632 trim_left_action = PRE_TRIMMED;
43714633 }
4634 #else
4635 trim_left_action = COMPUTE_TRIM;
4636 #endif
43724637 trim_right_action = NO_TRIM;
43734638
43744639 } else {
43884653 querystart = querylength - acceptor_pos;
43894654 queryend = substring_queryend; /* querylength, for an end piece */
43904655 trim_left_action = NO_TRIM;
4656 #if 0
43914657 if (queryend == querylength) {
43924658 trim_right_action = COMPUTE_TRIM; /* queryend == querylength */
43934659 } else {
43944660 trim_right_action = PRE_TRIMMED;
43954661 }
4662 #else
4663 trim_right_action = COMPUTE_TRIM;
4664 #endif
43964665
43974666 } else if (sensedir == SENSE_ANTI) {
43984667 start_endtype = END;
44004669
44014670 querystart = substring_querystart; /* 0, for an end piece */
44024671 queryend = querylength - acceptor_pos;
4672 #if 0
44034673 if (querystart == 0) {
44044674 trim_left_action = COMPUTE_TRIM; /* querystart == 0 */
44054675 } else {
44064676 trim_left_action = PRE_TRIMMED;
44074677 }
4678 #else
4679 trim_left_action = COMPUTE_TRIM;
4680 #endif
44084681 trim_right_action = NO_TRIM;
44094682
44104683 } else {
44234696 return (T) NULL;
44244697 }
44254698
4426 debug2(printf("Making new acceptor with splicesites_i %d, coord %u and left %u, plusp %d, sensedir %d, query %d..%d, genome %u..%u\n",
4427 acceptor_knowni,acceptor_coord,left,plusp,sensedir,querystart,queryend,alignstart - chroffset,alignend - chroffset));
4699 debug2(printf("Making new acceptor with splicesites_i %d, coord %u and left %u, plusp %d, sensedir %d, query %d..%d, trim %d..%d, genome %u..%u\n",
4700 acceptor_knowni,acceptor_coord,left,plusp,sensedir,new->querystart,new->queryend,
4701 new->trim_left,new->trim_right,alignstart - chroffset,alignend - chroffset));
44284702 debug2(printf("Original bounds were %d..%d\n",substring_querystart,substring_queryend));
4429
4430 new->splicecoord = acceptor_coord;
4431 new->splicesites_knowni = acceptor_knowni;
4432
4433 new->chimera_modelpos = left + acceptor_pos;
4434 assert(new->splicecoord == new->chimera_modelpos);
4703 debug2(printf("Setting siteA_prob to be %f\n",acceptor_prob));
4704
4705 new->splicecoord_A = acceptor_coord;
4706 new->splicesitesA_knowni = acceptor_knowni;
4707 assert(acceptor_coord == left + acceptor_pos);
4708
44354709 new->chimera_sensedir = sensedir;
44364710 if (acceptor_knowni >= 0) {
4437 new->chimera_knownp = true;
4711 new->siteA_knownp = true;
44384712 /* new->chimera_novelp = false */
44394713 } else {
44404714 /* new->chimera_knownp = false; */
4441 new->chimera_novelp = true;
4715 new->siteA_novelp = true;
44424716 }
44434717
44444718 if (plusp == true) {
4445 new->chimera_pos = acceptor_pos;
4446 } else {
4447 new->chimera_pos = querylength - acceptor_pos;
4448 }
4449 new->chimera_prob = acceptor_prob;
4450
4719 new->siteA_pos = acceptor_pos;
4720 } else {
4721 new->siteA_pos = querylength - acceptor_pos;
4722 }
44514723 new->siteA_prob = acceptor_prob;
4452 new->siteD_prob = 0.0;
44534724
44544725 return new;
44554726 }
45254796 }
45264797
45274798 debug2(printf("Making new middle with left %u, plusp %d\n",left,plusp));
4528 new->splicecoord = acceptor_coord;
4529 new->splicesites_knowni = acceptor_knowni;
4530 new->splicecoord_2 = donor_coord;
4531 new->splicesites_knowni_2 = donor_knowni;
4532
4533 new->chimera_modelpos = left + acceptor_pos;
4534 new->chimera_modelpos_2 = left + donor_pos;
4799 new->splicecoord_A = acceptor_coord;
4800 new->splicesitesA_knowni = acceptor_knowni;
4801 new->splicecoord_D = donor_coord;
4802 new->splicesitesD_knowni = donor_knowni;
4803
45354804 new->chimera_sensedir = sensedir;
45364805
45374806 if (acceptor_knowni >= 0) {
4538 new->chimera_knownp = true;
4807 new->siteA_knownp = true;
45394808 /* new->chimera_novelp = false; */
45404809 } else {
45414810 /* new->chimera_knownp = false; */
4542 new->chimera_novelp = true;
4811 new->siteA_novelp = true;
45434812 }
45444813
45454814 if (donor_knowni >= 0) {
4546 new->chimera_knownp_2 = true;
4815 new->siteD_knownp = true;
45474816 /* new->chimera_novelp_2 = false; */
45484817 } else {
4549 /* new->chimera_knownp_2 = false; */
4550 new->chimera_novelp_2 = true;
4818 /* new->siteD_knownp_2 = false; */
4819 new->siteD_novelp = true;
45514820 }
45524821
45534822 if (plusp == true) {
4554 new->chimera_pos = acceptor_pos;
4555 new->chimera_pos_2 = donor_pos;
4556 } else {
4557 new->chimera_pos = querylength - acceptor_pos;
4558 new->chimera_pos_2 = querylength - donor_pos;
4559 }
4560
4561 new->chimera_prob = acceptor_prob;
4562 new->chimera_prob_2 = donor_prob;
4823 new->siteA_pos = acceptor_pos;
4824 new->siteD_pos = donor_pos;
4825 } else {
4826 new->siteA_pos = querylength - acceptor_pos;
4827 new->siteD_pos = querylength - donor_pos;
4828 }
45634829
45644830 new->siteA_prob = acceptor_prob;
45654831 new->siteD_prob = donor_prob;
45754841 if (donor == NULL) {
45764842 return;
45774843
4578 } else if (donor->chimera_knownp == false) {
4844 } else if (donor->siteD_knownp == false) {
45794845 /* Prob already assigned */
45804846
45814847 } else if (donor->chimera_sensedir == SENSE_FORWARD) {
45824848 if (donor->plusp == true) {
4583 donor->chimera_prob = Maxent_hr_donor_prob(donor->splicecoord,donor->chroffset);
4584 } else {
4585 donor->chimera_prob = Maxent_hr_antidonor_prob(donor->splicecoord,donor->chroffset);
4849 donor->siteD_prob = Maxent_hr_donor_prob(donor->splicecoord_D,donor->chroffset);
4850 } else {
4851 donor->siteD_prob = Maxent_hr_antidonor_prob(donor->splicecoord_D,donor->chroffset);
45864852 }
45874853
45884854 } else if (donor->chimera_sensedir == SENSE_ANTI) {
45894855 if (donor->plusp == true) {
4590 donor->chimera_prob = Maxent_hr_antidonor_prob(donor->splicecoord,donor->chroffset);
4591 } else {
4592 donor->chimera_prob = Maxent_hr_donor_prob(donor->splicecoord,donor->chroffset);
4856 donor->siteD_prob = Maxent_hr_antidonor_prob(donor->splicecoord_D,donor->chroffset);
4857 } else {
4858 donor->siteD_prob = Maxent_hr_donor_prob(donor->splicecoord_D,donor->chroffset);
45934859 }
45944860
45954861 } else {
45964862 /* SENSE_NULL */
4597 donor->chimera_prob = 0.0;
4863 donor->siteD_prob = 0.0;
45984864 }
45994865
46004866 return;
46064872 if (acceptor == NULL) {
46074873 return;
46084874
4609 } else if (acceptor->chimera_knownp == false) {
4875 } else if (acceptor->siteA_knownp == false) {
46104876 /* Prob already assigned */
46114877
46124878 } else if (acceptor->chimera_sensedir == SENSE_FORWARD) {
46134879 if (acceptor->plusp == true) {
4614 acceptor->chimera_prob = Maxent_hr_acceptor_prob(acceptor->splicecoord,acceptor->chroffset);
4615 } else {
4616 acceptor->chimera_prob = Maxent_hr_antiacceptor_prob(acceptor->splicecoord,acceptor->chroffset);
4880 acceptor->siteA_prob = Maxent_hr_acceptor_prob(acceptor->splicecoord_A,acceptor->chroffset);
4881 } else {
4882 acceptor->siteA_prob = Maxent_hr_antiacceptor_prob(acceptor->splicecoord_A,acceptor->chroffset);
46174883 }
46184884
46194885 } else if (acceptor->chimera_sensedir == SENSE_ANTI) {
46204886 if (acceptor->plusp == true) {
4621 acceptor->chimera_prob = Maxent_hr_antiacceptor_prob(acceptor->splicecoord,acceptor->chroffset);
4622 } else {
4623 acceptor->chimera_prob = Maxent_hr_acceptor_prob(acceptor->splicecoord,acceptor->chroffset);
4887 acceptor->siteA_prob = Maxent_hr_antiacceptor_prob(acceptor->splicecoord_A,acceptor->chroffset);
4888 } else {
4889 acceptor->siteA_prob = Maxent_hr_acceptor_prob(acceptor->splicecoord_A,acceptor->chroffset);
46244890 }
46254891
46264892 } else {
46274893 /* SENSE_NULL */
4628 acceptor->chimera_prob = 0.0;
4894 acceptor->siteA_prob = 0.0;
46294895 }
46304896
46314897 return;
46354901 void
46364902 Substring_assign_shortexon_prob (T shortexon) {
46374903
4638 if (shortexon->chimera_knownp == false) {
4904 if (shortexon->siteA_knownp == false) {
46394905 /* Prob1 already assigned */
46404906
46414907 } else if (shortexon->chimera_sensedir == SENSE_FORWARD) {
46424908 if (shortexon->plusp == true) {
4643 shortexon->chimera_prob = Maxent_hr_acceptor_prob(shortexon->chimera_modelpos,shortexon->chroffset);
4644 } else {
4645 shortexon->chimera_prob = Maxent_hr_antiacceptor_prob(shortexon->chimera_modelpos,shortexon->chroffset);
4909 shortexon->siteA_prob = Maxent_hr_acceptor_prob(shortexon->splicecoord_A,shortexon->chroffset);
4910 } else {
4911 shortexon->siteA_prob = Maxent_hr_antiacceptor_prob(shortexon->splicecoord_A,shortexon->chroffset);
46464912 }
46474913
46484914 } else if (shortexon->chimera_sensedir == SENSE_ANTI) {
46494915 if (shortexon->plusp == true) {
4650 shortexon->chimera_prob = Maxent_hr_antiacceptor_prob(shortexon->chimera_modelpos,shortexon->chroffset);
4651 } else {
4652 shortexon->chimera_prob = Maxent_hr_acceptor_prob(shortexon->chimera_modelpos,shortexon->chroffset);
4916 shortexon->siteA_prob = Maxent_hr_antiacceptor_prob(shortexon->splicecoord_A,shortexon->chroffset);
4917 } else {
4918 shortexon->siteA_prob = Maxent_hr_acceptor_prob(shortexon->splicecoord_A,shortexon->chroffset);
46534919 }
46544920
46554921 } else {
46564922 abort();
46574923 }
46584924
4659 if (shortexon->chimera_knownp_2 == false) {
4925 if (shortexon->siteD_knownp == false) {
46604926 /* Prob2 already assigned */
46614927
46624928 } else if (shortexon->chimera_sensedir == SENSE_FORWARD) {
46634929 if (shortexon->plusp == true) {
4664 shortexon->chimera_prob_2 = Maxent_hr_donor_prob(shortexon->chimera_modelpos_2,shortexon->chroffset);
4665 } else {
4666 shortexon->chimera_prob_2 = Maxent_hr_antidonor_prob(shortexon->chimera_modelpos_2,shortexon->chroffset);
4930 shortexon->siteD_prob = Maxent_hr_donor_prob(shortexon->splicecoord_D,shortexon->chroffset);
4931 } else {
4932 shortexon->siteD_prob = Maxent_hr_antidonor_prob(shortexon->splicecoord_D,shortexon->chroffset);
46674933 }
46684934
46694935 } else if (shortexon->chimera_sensedir == SENSE_ANTI) {
46704936 if (shortexon->plusp == true) {
4671 shortexon->chimera_prob_2 = Maxent_hr_antidonor_prob(shortexon->chimera_modelpos_2,shortexon->chroffset);
4672 } else {
4673 shortexon->chimera_prob_2 = Maxent_hr_donor_prob(shortexon->chimera_modelpos_2,shortexon->chroffset);
4937 shortexon->siteD_prob = Maxent_hr_antidonor_prob(shortexon->splicecoord_D,shortexon->chroffset);
4938 } else {
4939 shortexon->siteD_prob = Maxent_hr_donor_prob(shortexon->splicecoord_D,shortexon->chroffset);
46744940 }
46754941
46764942 } else {
46834949
46844950
46854951 static int
4686 ascending_pos_cmp (const void *a, const void *b) {
4952 ascending_siteD_pos_cmp (const void *a, const void *b) {
46874953 T x = * (T *) a;
46884954 T y = * (T *) b;
46894955
4690 if (x->chimera_pos < y->chimera_pos) {
4956 if (x->siteD_pos < y->siteD_pos) {
46914957 return -1;
4692 } else if (x->chimera_pos > y->chimera_pos) {
4958 } else if (x->siteD_pos > y->siteD_pos) {
46934959 return +1;
46944960 } else if (x->genomicstart < y->genomicstart) {
46954961 return -1;
46964962 } else if (x->genomicstart > y->genomicstart) {
46974963 return +1;
4698 } else if (x->chimera_knownp == true && y->chimera_knownp == false) {
4964 } else if (x->siteD_knownp == true && y->siteD_knownp == false) {
46994965 return -1;
4700 } else if (y->chimera_knownp == true && x->chimera_knownp == false) {
4966 } else if (y->siteD_knownp == true && x->siteD_knownp == false) {
47014967 return +1;
47024968 } else {
47034969 return 0;
47054971 }
47064972
47074973 static int
4708 descending_pos_cmp (const void *a, const void *b) {
4974 ascending_siteA_pos_cmp (const void *a, const void *b) {
47094975 T x = * (T *) a;
47104976 T y = * (T *) b;
47114977
4712 if (x->chimera_pos < y->chimera_pos) {
4978 if (x->siteA_pos < y->siteA_pos) {
47134979 return -1;
4714 } else if (x->chimera_pos > y->chimera_pos) {
4980 } else if (x->siteA_pos > y->siteA_pos) {
4981 return +1;
4982 } else if (x->genomicstart < y->genomicstart) {
4983 return -1;
4984 } else if (x->genomicstart > y->genomicstart) {
4985 return +1;
4986 } else if (x->siteA_knownp == true && y->siteA_knownp == false) {
4987 return -1;
4988 } else if (y->siteA_knownp == true && x->siteA_knownp == false) {
4989 return +1;
4990 } else {
4991 return 0;
4992 }
4993 }
4994
4995 static int
4996 ascending_siteN_pos_cmp (const void *a, const void *b) {
4997 T x = * (T *) a;
4998 T y = * (T *) b;
4999
5000 if (x->siteN_pos < y->siteN_pos) {
5001 return -1;
5002 } else if (x->siteN_pos > y->siteN_pos) {
5003 return +1;
5004 } else if (x->genomicstart < y->genomicstart) {
5005 return -1;
5006 } else if (x->genomicstart > y->genomicstart) {
5007 return +1;
5008 } else {
5009 return 0;
5010 }
5011 }
5012
5013 static int
5014 descending_siteD_pos_cmp (const void *a, const void *b) {
5015 T x = * (T *) a;
5016 T y = * (T *) b;
5017
5018 if (x->siteD_pos < y->siteD_pos) {
5019 return -1;
5020 } else if (x->siteD_pos > y->siteD_pos) {
47155021 return +1;
47165022 } else if (x->genomicstart > y->genomicstart) {
47175023 return -1;
47185024 } else if (x->genomicstart < y->genomicstart) {
47195025 return +1;
4720 } else if (x->chimera_knownp == true && y->chimera_knownp == false) {
5026 } else if (x->siteD_knownp == true && y->siteD_knownp == false) {
47215027 return -1;
4722 } else if (y->chimera_knownp == true && x->chimera_knownp == false) {
5028 } else if (y->siteD_knownp == true && x->siteD_knownp == false) {
47235029 return +1;
47245030 } else {
47255031 return 0;
47265032 }
47275033 }
47285034
5035 static int
5036 descending_siteA_pos_cmp (const void *a, const void *b) {
5037 T x = * (T *) a;
5038 T y = * (T *) b;
5039
5040 if (x->siteA_pos < y->siteA_pos) {
5041 return -1;
5042 } else if (x->siteA_pos > y->siteA_pos) {
5043 return +1;
5044 } else if (x->genomicstart > y->genomicstart) {
5045 return -1;
5046 } else if (x->genomicstart < y->genomicstart) {
5047 return +1;
5048 } else if (x->siteA_knownp == true && y->siteA_knownp == false) {
5049 return -1;
5050 } else if (y->siteA_knownp == true && x->siteA_knownp == false) {
5051 return +1;
5052 } else {
5053 return 0;
5054 }
5055 }
5056
5057 static int
5058 descending_siteN_pos_cmp (const void *a, const void *b) {
5059 T x = * (T *) a;
5060 T y = * (T *) b;
5061
5062 if (x->siteN_pos < y->siteN_pos) {
5063 return -1;
5064 } else if (x->siteN_pos > y->siteN_pos) {
5065 return +1;
5066 } else if (x->genomicstart > y->genomicstart) {
5067 return -1;
5068 } else if (x->genomicstart < y->genomicstart) {
5069 return +1;
5070 } else {
5071 return 0;
5072 }
5073 }
5074
47295075 List_T
4730 Substring_sort_chimera_halves (List_T hitlist, bool ascendingp) {
5076 Substring_sort_siteD_halves (List_T hitlist, bool ascendingp) {
47315077 List_T sorted = NULL;
47325078 T x, *hits;
47335079 int n, i, j;
47445090 List_fill_array_and_free((void **) hits,&hitlist);
47455091
47465092 if (ascendingp == true) {
4747 qsort(hits,n,sizeof(T),ascending_pos_cmp);
4748 } else {
4749 qsort(hits,n,sizeof(T),descending_pos_cmp);
5093 qsort(hits,n,sizeof(T),ascending_siteD_pos_cmp);
5094 } else {
5095 qsort(hits,n,sizeof(T),descending_siteD_pos_cmp);
47505096 }
47515097
47525098 /* Check for duplicates */
47545100 for (i = 0; i < n; i++) {
47555101 x = hits[i];
47565102 j = i+1;
4757 while (j < n && hits[j]->chimera_pos == x->chimera_pos && hits[j]->genomicstart == x->genomicstart) {
5103 while (j < n && hits[j]->siteD_pos == x->siteD_pos && hits[j]->genomicstart == x->genomicstart) {
5104 eliminate[j] = true;
5105 j++;
5106 }
5107 }
5108
5109 debug(j = 0);
5110 for (i = n-1; i >= 0; i--) {
5111 x = hits[i];
5112 if (eliminate[i] == false) {
5113 sorted = List_push(sorted,x);
5114 } else {
5115 Substring_free(&x);
5116 debug(j++);
5117 }
5118 }
5119 debug(printf("%d eliminated\n",j));
5120
5121 FREEA(hits);
5122 FREEA(eliminate);
5123
5124 return sorted;
5125 }
5126
5127 List_T
5128 Substring_sort_siteA_halves (List_T hitlist, bool ascendingp) {
5129 List_T sorted = NULL;
5130 T x, *hits;
5131 int n, i, j;
5132 bool *eliminate;
5133
5134 n = List_length(hitlist);
5135 debug(printf("Checking %d spliceends for duplicates...",n));
5136 if (n == 0) {
5137 debug(printf("\n"));
5138 return NULL;
5139 }
5140
5141 hits = (T *) MALLOCA(n * sizeof(T));
5142 List_fill_array_and_free((void **) hits,&hitlist);
5143
5144 if (ascendingp == true) {
5145 qsort(hits,n,sizeof(T),ascending_siteA_pos_cmp);
5146 } else {
5147 qsort(hits,n,sizeof(T),descending_siteA_pos_cmp);
5148 }
5149
5150 /* Check for duplicates */
5151 eliminate = (bool *) CALLOCA(n,sizeof(bool));
5152 for (i = 0; i < n; i++) {
5153 x = hits[i];
5154 j = i+1;
5155 while (j < n && hits[j]->siteA_pos == x->siteA_pos && hits[j]->genomicstart == x->genomicstart) {
5156 eliminate[j] = true;
5157 j++;
5158 }
5159 }
5160
5161 debug(j = 0);
5162 for (i = n-1; i >= 0; i--) {
5163 x = hits[i];
5164 if (eliminate[i] == false) {
5165 sorted = List_push(sorted,x);
5166 } else {
5167 Substring_free(&x);
5168 debug(j++);
5169 }
5170 }
5171 debug(printf("%d eliminated\n",j));
5172
5173 FREEA(hits);
5174 FREEA(eliminate);
5175
5176 return sorted;
5177 }
5178
5179 List_T
5180 Substring_sort_siteN_halves (List_T hitlist, bool ascendingp) {
5181 List_T sorted = NULL;
5182 T x, *hits;
5183 int n, i, j;
5184 bool *eliminate;
5185
5186 n = List_length(hitlist);
5187 debug(printf("Checking %d spliceends for duplicates...",n));
5188 if (n == 0) {
5189 debug(printf("\n"));
5190 return NULL;
5191 }
5192
5193 hits = (T *) MALLOCA(n * sizeof(T));
5194 List_fill_array_and_free((void **) hits,&hitlist);
5195
5196 if (ascendingp == true) {
5197 qsort(hits,n,sizeof(T),ascending_siteN_pos_cmp);
5198 } else {
5199 qsort(hits,n,sizeof(T),descending_siteN_pos_cmp);
5200 }
5201
5202 /* Check for duplicates */
5203 eliminate = (bool *) CALLOCA(n,sizeof(bool));
5204 for (i = 0; i < n; i++) {
5205 x = hits[i];
5206 j = i+1;
5207 while (j < n && hits[j]->siteN_pos == x->siteN_pos && hits[j]->genomicstart == x->genomicstart) {
47585208 eliminate[j] = true;
47595209 j++;
47605210 }
49455395 }
49465396
49475397 /* Note: this->chimera_knownp might not be set for GMAP alignments */
4948 if (this->chimera_knownp == true) {
5398 if (this->siteD_knownp == true) {
49495399 /* Note: IIT_get_typed_signed_with_divno does not work here */
49505400 splicesites = IIT_get_exact_multiple_with_divno(&nsplicesites,splicesites_iit,
49515401 splicesites_divint_crosstable[this->chrnum],
53275777 /* Handle result of substring_trim_novel_spliceends */
53285778 if (invertp == false) {
53295779 if (substring->start_endtype == DON) {
5330 FPRINTF(fp,"donor:%.2f",substring->trim_left,substring->chimera_prob);
5780 FPRINTF(fp,"donor:%.2f",substring->siteD_prob);
53315781 } else if (substring->start_endtype == ACC) {
5332 FPRINTF(fp,"acceptor:%.2f",substring->trim_left,substring->chimera_prob);
5782 FPRINTF(fp,"acceptor:%.2f",substring->siteA_prob);
53335783 } else {
53345784 FPRINTF(fp,"start:%d",substring->trim_left);
53355785 }
53365786 } else {
53375787 if (substring->end_endtype == DON) {
5338 FPRINTF(fp,"donor:%.2f",substring->trim_right,substring->chimera_prob_2);
5788 FPRINTF(fp,"donor:%.2f",substring->siteD_prob);
53395789 } else if (substring->end_endtype == ACC) {
5340 FPRINTF(fp,"acceptor:%.2f",substring->trim_right,substring->chimera_prob_2);
5790 FPRINTF(fp,"acceptor:%.2f",substring->siteA_prob);
53415791 } else {
53425792 FPRINTF(fp,"start:%d",substring->trim_right);
53435793 }
53735823 /* Handle result of substring_trim_novel_spliceends */
53745824 if (invertp == false) {
53755825 if (substring->end_endtype == DON) {
5376 FPRINTF(fp,"donor:%.2f",substring->trim_right,substring->chimera_prob_2);
5826 FPRINTF(fp,"donor:%.2f",substring->siteD_prob);
53775827 } else if (substring->end_endtype == ACC) {
5378 FPRINTF(fp,"acceptor:%.2f",substring->trim_right,substring->chimera_prob_2);
5828 FPRINTF(fp,"acceptor:%.2f",substring->siteA_prob);
53795829 } else {
53805830 FPRINTF(fp,"end:%d",substring->trim_right);
53815831 }
53825832 } else {
53835833 if (substring->start_endtype == DON) {
5384 FPRINTF(fp,"donor:%.2f",substring->trim_left,substring->chimera_prob);
5834 FPRINTF(fp,"donor:%.2f",substring->siteD_prob);
53855835 } else if (substring->start_endtype == ACC) {
5386 FPRINTF(fp,"acceptor:%.2f",substring->trim_left,substring->chimera_prob);
5836 FPRINTF(fp,"acceptor:%.2f",substring->siteA_prob);
53875837 } else {
53885838 FPRINTF(fp,"end:%d",substring->trim_left);
53895839 }
58096259 FPRINTF(fp,"\t");
58106260 if (sensedir == SENSE_FORWARD) {
58116261 if (invertp == false) {
5812 FPRINTF(fp,"start:%d..donor:%.2f",donor->trim_left,donor->chimera_prob);
6262 FPRINTF(fp,"start:%d..donor:%.2f",donor->trim_left,donor->siteD_prob);
58136263 label_tag = "label_2";
58146264 splice_dist_tag = "splice_dist_2";
58156265 } else {
5816 FPRINTF(fp,"donor:%.2f..end:%d",donor->chimera_prob,donor->trim_left);
6266 FPRINTF(fp,"donor:%.2f..end:%d",donor->siteD_prob,donor->trim_left);
58176267 label_tag = "label_1";
58186268 splice_dist_tag = "splice_dist_1";
58196269 }
58206270 } else if (sensedir == SENSE_ANTI) {
58216271 if (invertp == false) {
5822 FPRINTF(fp,"donor:%.2f..end:%d",donor->chimera_prob,donor->trim_right);
6272 FPRINTF(fp,"donor:%.2f..end:%d",donor->siteD_prob,donor->trim_right);
58236273 label_tag = "label_1";
58246274 splice_dist_tag = "splice_dist_1";
58256275 } else {
5826 FPRINTF(fp,"start:%d..donor:%.2f",donor->trim_right,donor->chimera_prob);
6276 FPRINTF(fp,"start:%d..donor:%.2f",donor->trim_right,donor->siteD_prob);
58276277 label_tag = "label_2";
58286278 splice_dist_tag = "splice_dist_2";
58296279 }
58306280 } else {
58316281 /* SENSE_NULL */
58326282 if (invertp == false) {
5833 FPRINTF(fp,"start:%d..splice:%.2f",donor->trim_left,donor->chimera_prob);
6283 FPRINTF(fp,"start:%d..splice:%.2f",donor->trim_left,donor->siteD_prob);
58346284 label_tag = "label_2";
58356285 splice_dist_tag = "splice_dist_2";
58366286 } else {
5837 FPRINTF(fp,"splice:%.2f..end:%d",donor->chimera_prob,donor->trim_left);
6287 FPRINTF(fp,"splice:%.2f..end:%d",donor->siteD_prob,donor->trim_left);
58386288 label_tag = "label_1";
58396289 splice_dist_tag = "splice_dist_1";
58406290 }
58706320 }
58716321
58726322 #ifdef CHECK_KNOWNI
5873 if (donor->chimera_knownp == false && splicesites_iit) {
6323 if (donor->siteD_knownp == false && splicesites_iit) {
58746324 if (donor->plusp == true) {
58756325 splicesitepos = donor->genomicstart - donor->chroffset + donor->chimera_pos;
58766326 } else {
58836333 }
58846334 #endif
58856335
5886 if (donor->chimera_knownp && splicesites_iit) {
5887 print_splicesite_labels(fp,donor,donor_typeint,donor->chimera_pos,label_tag);
6336 if (donor->siteD_knownp && splicesites_iit) {
6337 print_splicesite_labels(fp,donor,donor_typeint,donor->siteD_pos,label_tag);
58886338 }
58896339
58906340 if (allocp == true) {
59166366 FPRINTF(fp,"\t");
59176367 if (sensedir == SENSE_FORWARD) {
59186368 if (invertp == false) {
5919 FPRINTF(fp,"acceptor:%.2f..end:%d",acceptor->chimera_prob,acceptor->trim_right);
6369 FPRINTF(fp,"acceptor:%.2f..end:%d",acceptor->siteA_prob,acceptor->trim_right);
59206370 label_tag = "label_1";
59216371 splice_dist_tag = "splice_dist_1";
59226372 } else {
5923 FPRINTF(fp,"start:%d..acceptor:%.2f",acceptor->trim_right,acceptor->chimera_prob);
6373 FPRINTF(fp,"start:%d..acceptor:%.2f",acceptor->trim_right,acceptor->siteA_prob);
59246374 label_tag = "label_2";
59256375 splice_dist_tag = "splice_dist_2";
59266376 }
59276377 } else if (sensedir == SENSE_ANTI) {
59286378 if (invertp == false) {
5929 FPRINTF(fp,"start:%d..acceptor:%.2f",acceptor->trim_left,acceptor->chimera_prob);
6379 FPRINTF(fp,"start:%d..acceptor:%.2f",acceptor->trim_left,acceptor->siteA_prob);
59306380 label_tag = "label_2";
59316381 splice_dist_tag = "splice_dist_2";
59326382 } else {
5933 FPRINTF(fp,"acceptor:%.2f..end:%d",acceptor->chimera_prob,acceptor->trim_left);
6383 FPRINTF(fp,"acceptor:%.2f..end:%d",acceptor->siteA_prob,acceptor->trim_left);
59346384 label_tag = "label_1";
59356385 splice_dist_tag = "splice_dist_1";
59366386 }
59376387 } else {
59386388 /* SENSE_NULL */
59396389 if (invertp == false) {
5940 FPRINTF(fp,"splice:%.2f..end:%d",acceptor->chimera_prob,acceptor->trim_right);
6390 FPRINTF(fp,"splice:%.2f..end:%d",acceptor->siteA_prob,acceptor->trim_right);
59416391 label_tag = "label_1";
59426392 splice_dist_tag = "splice_dist_1";
59436393 } else {
5944 FPRINTF(fp,"start:%d..splice:%.2f",acceptor->trim_right,acceptor->chimera_prob);
6394 FPRINTF(fp,"start:%d..splice:%.2f",acceptor->trim_right,acceptor->siteA_prob);
59456395 label_tag = "label_2";
59466396 splice_dist_tag = "splice_dist_2";
59476397 }
59796429 #ifdef CHECK_KNOWNI
59806430 if (acceptor->chimera_knownp == false && splicesites_iit) {
59816431 if (acceptor->plusp == true) {
5982 splicesitepos = acceptor->genomicstart - acceptor->chroffset + acceptor->chimera_pos;
5983 } else {
5984 splicesitepos = acceptor->genomicstart - acceptor->chroffset - acceptor->chimera_pos;
6432 splicesitepos = acceptor->genomicstart - acceptor->chroffset + acceptor->siteA_pos;
6433 } else {
6434 splicesitepos = acceptor->genomicstart - acceptor->chroffset - acceptor->siteA_pos;
59856435 }
59866436 splicesites = IIT_get_exact_multiple_with_divno(&nsplicesites,splicesites_iit,
59876437 splicesites_divint_crosstable[acceptor->chrnum],
59916441 #endif
59926442
59936443
5994 if (acceptor->chimera_knownp && splicesites_iit) {
5995 print_splicesite_labels(fp,acceptor,acceptor_typeint,acceptor->chimera_pos,label_tag);
6444 if (acceptor->siteA_knownp && splicesites_iit) {
6445 print_splicesite_labels(fp,acceptor,acceptor_typeint,acceptor->siteA_pos,label_tag);
59966446 }
59976447
59986448
60516501
60526502 FPRINTF(fp,"\t");
60536503 if (sensedir == SENSE_FORWARD && invertp == false) {
6054 FPRINTF(fp,"acceptor:%.2f..donor:%.2f",shortexon->chimera_prob,shortexon->chimera_prob_2);
6504 FPRINTF(fp,"acceptor:%.2f..donor:%.2f",shortexon->siteA_prob,shortexon->siteD_prob);
60556505 } else if (sensedir == SENSE_FORWARD && invertp == true) {
6056 FPRINTF(fp,"donor:%.2f..acceptor:%.2f",shortexon->chimera_prob_2,shortexon->chimera_prob);
6506 FPRINTF(fp,"donor:%.2f..acceptor:%.2f",shortexon->siteD_prob,shortexon->siteA_prob);
60576507 } else if (sensedir == SENSE_ANTI && invertp == false) {
6058 FPRINTF(fp,"donor:%.2f..acceptor:%.2f",shortexon->chimera_prob_2,shortexon->chimera_prob);
6508 FPRINTF(fp,"donor:%.2f..acceptor:%.2f",shortexon->siteD_prob,shortexon->siteA_prob);
60596509 } else if (sensedir == SENSE_ANTI && invertp == true) {
6060 FPRINTF(fp,"acceptor:%.2f..donor:%.2f",shortexon->chimera_prob,shortexon->chimera_prob_2);
6510 FPRINTF(fp,"acceptor:%.2f..donor:%.2f",shortexon->siteA_prob,shortexon->siteD_prob);
60616511 }
60626512
60636513 FPRINTF(fp,",matches:%d,sub:%d",shortexon->nmatches,shortexon->nmismatches_bothdiff);
60726522 FPRINTF(fp,",dir:sense");
60736523 print_shortexon_splice_distances(fp,distance1,distance2);
60746524
6075 if (shortexon->chimera_knownp && splicesites_iit) {
6525 if (shortexon->siteA_knownp && splicesites_iit) {
60766526 print_splicesite_labels(fp,shortexon,acceptor_typeint,
6077 shortexon->chimera_pos,/*tag*/"label_1");
6078 }
6079 if (shortexon->chimera_knownp_2 && splicesites_iit) {
6527 shortexon->siteA_pos,/*tag*/"label_1");
6528 }
6529 if (shortexon->siteD_knownp && splicesites_iit) {
60806530 print_splicesite_labels(fp,shortexon,donor_typeint,
6081 shortexon->chimera_pos_2,/*tag*/"label_2");
6531 shortexon->siteD_pos,/*tag*/"label_2");
60826532 }
60836533
60846534 } else if (sensedir == SENSE_FORWARD && invertp == true) {
60856535 FPRINTF(fp,",dir:antisense");
60866536 print_shortexon_splice_distances(fp,distance1,distance2);
60876537
6088 if (shortexon->chimera_knownp_2 && splicesites_iit) {
6538 if (shortexon->siteD_knownp && splicesites_iit) {
60896539 print_splicesite_labels(fp,shortexon,donor_typeint,
6090 shortexon->chimera_pos_2,/*tag*/"label_1");
6091 }
6092 if (shortexon->chimera_knownp && splicesites_iit) {
6540 shortexon->siteD_pos,/*tag*/"label_1");
6541 }
6542 if (shortexon->siteA_knownp && splicesites_iit) {
60936543 print_splicesite_labels(fp,shortexon,acceptor_typeint,
6094 shortexon->chimera_pos,/*tag*/"label_2");
6544 shortexon->siteA_pos,/*tag*/"label_2");
60956545 }
60966546
60976547 } else if (sensedir == SENSE_ANTI && invertp == false) {
61006550
61016551
61026552
6103 if (shortexon->chimera_knownp_2 && splicesites_iit) {
6553 if (shortexon->siteD_knownp && splicesites_iit) {
61046554 print_splicesite_labels(fp,shortexon,donor_typeint,
6105 shortexon->chimera_pos_2,/*tag*/"label_1");
6106 }
6107
6108 if (shortexon->chimera_knownp && splicesites_iit) {
6555 shortexon->siteD_pos,/*tag*/"label_1");
6556 }
6557
6558 if (shortexon->siteA_knownp && splicesites_iit) {
61096559 print_splicesite_labels(fp,shortexon,acceptor_typeint,
6110 shortexon->chimera_pos,/*tag*/"label_2");
6560 shortexon->siteA_pos,/*tag*/"label_2");
61116561 }
61126562
61136563 } else if (sensedir == SENSE_ANTI && invertp == true) {
61146564 FPRINTF(fp,",dir:sense");
61156565 print_shortexon_splice_distances(fp,distance1,distance2);
6116 if (shortexon->chimera_knownp && splicesites_iit) {
6566 if (shortexon->siteA_knownp && splicesites_iit) {
61176567 print_splicesite_labels(fp,shortexon,acceptor_typeint,
6118 shortexon->chimera_pos,/*tag*/"label_1");
6119 }
6120 if (shortexon->chimera_knownp_2 && splicesites_iit) {
6568 shortexon->siteA_pos,/*tag*/"label_1");
6569 }
6570 if (shortexon->siteD_knownp && splicesites_iit) {
61216571 print_splicesite_labels(fp,shortexon,donor_typeint,
6122 shortexon->chimera_pos_2,/*tag*/"label_2");
6572 shortexon->siteD_pos,/*tag*/"label_2");
61236573 }
61246574 }
61256575
0 /* $Id: substring.h 195961 2016-08-08 16:36:34Z twu $ */
0 /* $Id: substring.h 196273 2016-08-12 15:15:06Z twu $ */
11 #ifndef SUBSTRING_INCLUDED
22 #define SUBSTRING_INCLUDED
33
1717 #include "junction.h"
1818 #include "intlist.h"
1919 #include "doublelist.h"
20 #include "list.h"
2021 #ifdef LARGE_GENOMES
2122 #include "uint8list.h"
2223 #else
3132
3233 extern char *
3334 Endtype_string (Endtype_T endtype);
35
36 extern char *
37 Trimaction_string (Trimaction_T trimaction);
3438
3539 extern void
3640 Substring_setup (bool print_nsnpdiffs_p_in, bool print_snplabels_p_in,
6064 int minlength, int sensedir);
6165
6266 extern T
63 Substring_new_ambig (int querystart, int queryend, int splice_pos, int querylength,
64 Chrnum_T chrnum, Univcoord_T chroffset,
65 Univcoord_T chrhigh, Chrpos_T chrlength,
66 bool plusp, int genestrand,
67 Substring_new_ambig_D (int querystart, int queryend, int splice_pos, int querylength,
68 Chrnum_T chrnum, Univcoord_T chroffset,
69 Univcoord_T chrhigh, Chrpos_T chrlength,
70 bool plusp, int genestrand,
6771 #ifdef LARGE_GENOMES
68 Uint8list_T ambcoords,
72 Uint8list_T ambcoords,
6973 #else
70 Uintlist_T ambcoords,
74 Uintlist_T ambcoords,
7175 #endif
72 Intlist_T amb_knowni, Intlist_T amb_nmismatches, Doublelist_T amb_probs,
73 double amb_common_prob, bool amb_donor_common_p, bool substring1p);
76 Intlist_T amb_knowni, Intlist_T amb_nmismatches, Doublelist_T amb_probs,
77 double amb_common_prob, bool substring1p);
78
79 extern T
80 Substring_new_ambig_A (int querystart, int queryend, int splice_pos, int querylength,
81 Chrnum_T chrnum, Univcoord_T chroffset,
82 Univcoord_T chrhigh, Chrpos_T chrlength,
83 bool plusp, int genestrand,
84 #ifdef LARGE_GENOMES
85 Uint8list_T ambcoords,
86 #else
87 Uintlist_T ambcoords,
88 #endif
89 Intlist_T amb_knowni, Intlist_T amb_nmismatches, Doublelist_T amb_probs,
90 double amb_common_prob, bool substring1p);
7491
7592 extern Univcoord_T
7693 Substring_set_unambiguous (double *donor_prob, double *acceptor_prob, Univcoord_T *genomicstart, Univcoord_T *genomicend,
106123 extern Univcoord_T
107124 Substring_left (T this);
108125 extern Univcoord_T
109 Substring_splicecoord (T this);
110 extern Chrpos_T
111 Substring_chr_splicecoord (T this);
112 extern int
113 Substring_splicesites_knowni (T this);
114 extern Univcoord_T
115126 Substring_splicecoord_A (T this);
116127 extern Univcoord_T
117128 Substring_splicecoord_D (T this);
129 extern Chrpos_T
130 Substring_chr_splicecoord_D (T this);
131 extern Chrpos_T
132 Substring_chr_splicecoord_A (T this);
133 extern int
134 Substring_splicesitesD_knowni (T this);
135 extern int
136 Substring_splicesitesA_knowni (T this);
118137
119138 extern bool
120139 Substring_plusp (T this);
225244 Substring_amb_acceptor_prob (T this);
226245
227246 extern double
247 Substring_siteD_prob (T this);
248 extern double
228249 Substring_siteA_prob (T this);
229 extern double
230 Substring_siteD_prob (T this);
231
232 extern double
233 Substring_chimera_prob (T this);
234 extern double
235 Substring_chimera_prob_2 (T this);
236 extern int
237 Substring_chimera_pos (T this);
238 extern int
239 Substring_chimera_pos_A (T this);
240 extern int
241 Substring_chimera_pos_D (T this);
250
251 extern int
252 Substring_siteD_pos (T this);
253 extern int
254 Substring_siteA_pos (T this);
255 extern int
256 Substring_siteN_pos (T this);
242257 extern int
243258 Substring_chimera_sensedir (T this);
244259
245260 extern bool
246261 Substring_ambiguous_p (T this);
262 extern bool
263 Substring_list_ambiguous_p (List_T list);
247264 extern int
248265 Substring_nambcoords (T this);
249266 extern Univcoord_T *
295312 Chrnum_T chrnum, Univcoord_T chroffset, Univcoord_T chrhigh, Chrpos_T chrlength);
296313
297314 extern List_T
298 Substring_sort_chimera_halves (List_T hitlist, bool ascendingp);
315 Substring_sort_siteD_halves (List_T hitlist, bool ascendingp);
316 extern List_T
317 Substring_sort_siteA_halves (List_T hitlist, bool ascendingp);
318 extern List_T
319 Substring_sort_siteN_halves (List_T hitlist, bool ascendingp);
299320
300321
301322 extern Chrpos_T
348369 Substring_add_intron (List_T pairs, T substringA, T substringB, int querylength,
349370 int hardclip_low, int hardclip_high, int queryseq_offset);
350371
351 extern void
352 Substring_trim_novel_spliceends (T substring1, T substringN, int *ambig_end_length_5, int *ambig_end_length_3,
353 Splicetype_T *ambig_splicetype_5, Splicetype_T *ambig_splicetype_3,
354 double *ambig_prob_5, double *ambig_prob_3, int *sensedir);
355
356372 #undef T
357373 #endif
358374
0 static char rcsid[] = "$Id: uniqscan.c 193877 2016-07-12 02:46:33Z twu $";
0 static char rcsid[] = "$Id: uniqscan.c 196438 2016-08-16 20:23:27Z twu $";
11 #ifdef HAVE_CONFIG_H
22 #include <config.h>
33 #endif
5858 #include "getopt.h"
5959
6060
61 #define MAX_FLOORS_READLENGTH 300
6162 #define MAX_QUERYLENGTH_FOR_ALLOC 100000
6263 #define MAX_GENOMICLENGTH_FOR_ALLOC 1000000
6364
392393 fprintf(stdout,"Sizes: off_t (%d), size_t (%d), unsigned int (%d), long int (%d), long long int (%d)\n",
393394 (int) sizeof(off_t),(int) sizeof(size_t),(int) sizeof(unsigned int),(int) sizeof(long int),(int) sizeof(long long int));
394395 fprintf(stdout,"Default gmap directory: %s\n",GMAPDB);
395 fprintf(stdout,"Maximum read length: %d\n",MAX_READLENGTH);
396 fprintf(stdout,"Maximum stack read length: %d\n",MAX_STACK_READLENGTH);
396397 fprintf(stdout,"Thomas D. Wu, Genentech, Inc.\n");
397398 fprintf(stdout,"Contact: twu@gene.com\n");
398399 fprintf(stdout,"\n");
447448 diagpool = Diagpool_new();
448449 cellpool = Cellpool_new();
449450
450 floors_array = (Floors_T *) CALLOC(MAX_READLENGTH+1,sizeof(Floors_T));
451 floors_array = (Floors_T *) CALLOC(MAX_FLOORS_READLENGTH+1,sizeof(Floors_T));
451452 /* Except_stack_create(); -- requires pthreads */
452453
453454 for (i = 0; i < 10; i++) {
552553
553554 }
554555
555 for (i = 0; i <= MAX_READLENGTH; i++) {
556 for (i = 0; i <= MAX_FLOORS_READLENGTH; i++) {
556557 if (floors_array[i] != NULL) {
557558 Floors_free_keep(&(floors_array[i]));
558559 }
13011302 nullgap,maxpeelback,maxpeelback_distalmedial,
13021303 extramaterial_end,extramaterial_paired,gmap_mode,
13031304 trigger_score_for_gmap,gmap_allowance,max_gmap_pairsearch,
1304 max_gmap_terminal,max_gmap_improvement,antistranded_penalty);
1305 max_gmap_terminal,max_gmap_improvement,antistranded_penalty,
1306 MAX_FLOORS_READLENGTH);
13051307 Substring_setup(/*print_nsnpdiffs_p*/false,/*print_snplabels_p*/false,
13061308 /*show_refdiff_p*/false,snps_iit,snps_divint_crosstable,
13071309 genes_iit,genes_divint_crosstable,
13211323 Pair_setup(trim_mismatch_score,trim_indel_score,/*gff3_separators_p*/false,/*sam_insert_0M_p*/false,
13221324 /*force_xs_direction_p*/false,/*md_lowercase_variant_p*/false,
13231325 /*snps_p*/snps_iit ? true : false,/*print_nsnpdiffs_p*/snps_iit ? true : false,
1324 Univ_IIT_genomelength(chromosome_iit,/*with_circular_alias*/false));
1326 Univ_IIT_genomelength(chromosome_iit,/*with_circular_alias*/false),
1327 /*gff3_phase_swap_p*/false);
13251328 Stage3_setup(/*splicingp*/novelsplicingp == true || knownsplicingp == true,novelsplicingp,
13261329 /*require_splicedir_p*/false,splicing_iit,splicing_divint_crosstable,
13271330 donor_typeint,acceptor_typeint,splicesites,altlocp,alias_starts,alias_ends,
0 /* $Id: univdiag.h 195760 2016-08-04 00:12:04Z twu $ */
0 /* $Id: univdiag.h 196273 2016-08-12 15:15:06Z twu $ */
11 #ifndef UNIVDIAG_INCLUDED
22 #define UNIVDIAG_INCLUDED
33
389389 LN_S = @LN_S@
390390 LTLIBOBJS = @LTLIBOBJS@
391391 LT_SYS_LIBRARY_PATH = @LT_SYS_LIBRARY_PATH@
392 MAINT = @MAINT@
392393 MAKEINFO = @MAKEINFO@
393394 MANIFEST_TOOL = @MANIFEST_TOOL@
394 MAX_READLENGTH = @MAX_READLENGTH@
395 MAX_STACK_READLENGTH = @MAX_STACK_READLENGTH@
395396 MKDIR_P = @MKDIR_P@
396397 MPICC = @MPICC@
397398 MPILIBS = @MPILIBS@
506507
507508 .SUFFIXES:
508509 .SUFFIXES: .log .test .test$(EXEEXT) .trs
509 $(srcdir)/Makefile.in: $(srcdir)/Makefile.am $(am__configure_deps)
510 $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps)
510511 @for dep in $?; do \
511512 case '$(am__configure_deps)' in \
512513 *$$dep*) \
530531 $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES)
531532 cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
532533
533 $(top_srcdir)/configure: $(am__configure_deps)
534 $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
534535 cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
535 $(ACLOCAL_M4): $(am__aclocal_m4_deps)
536 $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
536537 cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
537538 $(am__aclocal_m4_deps):
538539 align.test: $(top_builddir)/config.status $(srcdir)/align.test.in
230230 LN_S = @LN_S@
231231 LTLIBOBJS = @LTLIBOBJS@
232232 LT_SYS_LIBRARY_PATH = @LT_SYS_LIBRARY_PATH@
233 MAINT = @MAINT@
233234 MAKEINFO = @MAKEINFO@
234235 MANIFEST_TOOL = @MANIFEST_TOOL@
235 MAX_READLENGTH = @MAX_READLENGTH@
236 MAX_STACK_READLENGTH = @MAX_STACK_READLENGTH@
236237 MKDIR_P = @MKDIR_P@
237238 MPICC = @MPICC@
238239 MPILIBS = @MPILIBS@
356357 all: all-am
357358
358359 .SUFFIXES:
359 $(srcdir)/Makefile.in: $(srcdir)/Makefile.am $(am__configure_deps)
360 $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps)
360361 @for dep in $?; do \
361362 case '$(am__configure_deps)' in \
362363 *$$dep*) \
380381 $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES)
381382 cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
382383
383 $(top_srcdir)/configure: $(am__configure_deps)
384 $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
384385 cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
385 $(ACLOCAL_M4): $(am__aclocal_m4_deps)
386 $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
386387 cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
387388 $(am__aclocal_m4_deps):
388389 gmap_compress.pl: $(top_builddir)/config.status $(srcdir)/gmap_compress.pl.in
3737 }
3838 @exons = ();
3939 $sortp = 0;
40 $gene_name = get_info(\@info,"gene_name","gene_id");
40 $gene_name = cat_info(\@info,"gene_id","gene_name");
4141 $last_transcript_id = $transcript_id;
4242 $chr = $fields[0];
4343 $strand = $fields[6];
106106 return "NA";
107107 }
108108
109 sub cat_info {
110 my $info = shift @_;
111 my @desired_keys = @_;
112 my @result = ();
113
114 foreach $desired_key (@desired_keys) {
115 foreach $item (@ {$info}) {
116 ($key,$value) = $item =~ /(\S+) (.+)/;
117 if ($key eq $desired_key) {
118 push @result,$value;
119 }
120 }
121 }
122
123 if ($#result < 0) {
124 print STDERR "Cannot find " . join(" or ",@desired_keys) . " in " . join("; ",@ {$info}) . "\n";
125 return "NA";
126 } else {
127 return join(" ",@result);
128 }
129 }
130
109131
110132 sub get_info_optional {
111133 my $info = shift @_;
107107 }
108108 @exons = ();
109109 $sortp = 0;
110 $gene_name = get_info(\@info,"gene_name","gene_id");
110 $gene_name = get_info(\@info,"gene_id","gene_name");
111111 $last_transcript_id = $transcript_id;
112112 $chr = $fields[0];
113113 $strand = $fields[6];
107107 }
108108 @exons = ();
109109 $sortp = 0;
110 $gene_name = get_info(\@info,"gene_name","gene_id");
110 $gene_name = get_info(\@info,"gene_id","gene_name");
111111 $last_transcript_id = $transcript_id;
112112 $chr = $fields[0];
113113 $strand = $fields[6];