Tree @3b4a945 (Download .tar.gz)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | [![travis-ci](https://travis-ci.org/conchoecia/pauvre.svg?branch=master)](https://travis-ci.org/conchoecia/pauvre) [![DOI](https://zenodo.org/badge/112774670.svg)](https://zenodo.org/badge/latestdoi/112774670) ## <a name="started"></a>Getting Started ``` pauvre custommargin -i custom.tsv --ycol length --xcol qual # Custom tsv input ``` ## Table of Contents - [Getting Started](#started) - [Users' Guide](#uguide) - [Installation](#installation) - [Requirements](#reqs) - [Install Instructions](#install) - [Usage](#usage) - [pauvre stats](#stats) - [pauvre marginplot](#marginplot) - [Basic Usage](#marginbasic) - [Plot Adjustments](#marginadjustments) - [Specialized Options](#marginspecialized) - [Contributors](#contributors) ## <a name="uguide"></a>Users' Guide Pauvre is a plotting package originally designed to help QC the length and quality distribution of Oxford Nanopore or PacBio reads. The main outputs are marginplots. Now, `pauvre` also hosts other additional data plotting scripts. This package currently hosts five scripts for plotting and/or printing stats. - `pauvre marginplot` - takes a fastq file as input and outputs a marginal histogram with a heatmap. - `pauvre custommargin` - takes a tsv as input and outputs a marginal histogram with custom columns of your choice. - `pauvre stats` - Takes a fastq file as input and prints out a table of stats, including how many basepairs/reads there are for a length/mean quality cutoff. - This is also automagically called when using `pauvre marginplot` - `pauvre redwood` - I am happy to introduce the redwood plot to the world as a method of representing circular genomes. A redwood plot contains long reads as "rings" on the inside, a gene annotation "cambrium/phloem", and a RNAseq "bark". The input is `.bam` files for the long reads and RNAseq data, and a `.gff` file for the annotation. More details to follow as we document this program better... - `pauvre synteny` - Makes a synteny plot of circular genomes. Finds the most parsimonius rotation to display the synteny of all the input genomes with the fewest crossings-over. Input is one `.gff` file per circular genome and one directory of gene alignments. ## <a name="installation"></a>Installation ### <a name="reqs"></a>Requirements - You must have the following installed on your system to install this software: - python 3.x - matplotlib - biopython - pandas - pillow ### <a name="install">Install Instructions - Instructions to install on your mac or linux system. Not sure on Windows! Make sure *python 3* is the active environment before installing. - `git clone https://github.com/conchoecia/pauvre.git` - `cd ./pauvre` - `pip3 install .` - Or, install with pip - `pip3 install pauvre` ## <a name="usage"><a/>Usage ### <a name="stats"></a>`stats` - generate basic statistics about the fastq file. For example, if I want to know the number of bases and reads with AT LEAST a PHRED score of 5 and AT LEAST a read length of 500, run the program as below and look at the cells highlighted with `<braces>`. - `pauvre stats --fastq miniDSMN15.fastq` ``` numReads: 1000 numBasepairs: 1029114 meanLen: 1029.114 medianLen: 875.5 minLen: 11 maxLen: 5337 N50: 1278 L50: 296 Basepairs >= bin by mean PHRED and length minLen Q0 Q5 Q10 Q15 Q17.5 Q20 Q21.5 Q25 Q25.5 Q30 0 1029114 1010681 935366 429279 143948 25139 3668 2938 2000 0 500 984212 <968653> 904787 421307 142003 24417 3668 2938 2000 0 1000 659842 649319 616788 300948 103122 17251 2000 2000 2000 0 et cetera... Number of reads >= bin by mean Phred+Len minLen Q0 Q5 Q10 Q15 Q17.5 Q20 Q21.5 Q25 Q25.5 Q30 0 1000 969 865 366 118 22 3 2 1 0 500 873 <859> 789 347 113 20 3 2 1 0 1000 424 418 396 187 62 11 1 1 1 0 et cetera... ``` ### <a name="marginplot"></a>`marginplot` #### <a name="marginbasic"></a>Basic Usage - automatically calls `pauvre stats` for each fastq file - Make the default plot showing the 99th percentile of longest reads - `pauvre marginplot --fastq miniDSMN15.fastq` - ![default](files/default_miniDSMN15.png) - Make a marginal histogram for ONT 2D or 1D^2 cDNA data with a lower maxlen and higher maxqual. - `pauvre marginplot --maxlen 4000 --maxqual 25 --lengthbin 50 --fileform pdf png --qualbin 0.5 --fastq miniDSMN15.fastq` - ![example1](files/miniDSMN15.png) #### <a name="marginadjustments"></a>Plot Adjustments - Filter out reads with a mean quality less than 5, and a length less than 800. Zoom in to plot only mean quality of at least 4 and read length at least 500bp. - `pauvre marginplot -f miniDSMN15.fastq --filt_minqual 5 --filt_minlen 800 -y --plot_minlen 500 --plot_minqual 4` - ![test4](files/test4.png) #### <a name="marginspecialized"></a>Specialized Options - Plot ONT 1D data with a large tail - `pauvre marginplot --maxlen 100000 --maxqual 15 --lengthbin 500 <myfile>.fastq` - Get more resolution on lengths - `pauvre marginplot --maxlen 100000 --lengthbin 5 <myfile>.fastq` - Turn off transparency if you just want a white background - `pauvre marginplot --transparent False <myfile>.fastq` - Note: transparency is the default behavior - ![transparency](files/transparency.001.jpeg) ## <a name="contributors"></a>Contributors @conchoecia (Darrin Schultz) @mebbert (Mark Ebbert) @wdecoster (Wouter De Coster) |
Commit History @3b4a94511887e93e763aa9e89e886fc7c6ad3395
0
»»
- Upstream applied several patches Andreas Tille 3 years ago
- Import new upstream version Andreas Tille 3 years ago
- Update upstream source from tag 'upstream/0.2.1' Andreas Tille 3 years ago
- New upstream version 0.2.1 Andreas Tille 3 years ago
- The competing ITP was closed by somebody else Andreas Tille 3 years ago
- Upstream fixed version numbering scheme, asked for rejection of the previous upload to new Andreas Tille 3 years ago
- Revert "Extend README.test" Andreas Tille 3 years ago
- Document how to run the test Andreas Tille 3 years ago
- Extend README.test Andreas Tille 3 years ago
- typo-redwood.patch applied upstream Etienne Mollier 3 years ago
- Upload to new Andreas Tille 3 years ago
- Use createmanpages to get manpage that is not refering to not existing info page Andreas Tille 3 years ago
- added skeleton manual page from html2man Etienne Mollier 3 years ago
- upstream patch fix kwargs Etienne Mollier 3 years ago
- adapted setup.py for scikit-learn Etienne Mollier 3 years ago
- Try autopkgtest Andreas Tille 3 years ago
- Install test script in examples Andreas Tille 3 years ago
- PYBUILD_NAME=pauvre and cleanuo d/rules Andreas Tille 3 years ago
- Add <!nocheck> to some Build-Depends Andreas Tille 3 years ago
- clean some input file landing in dist-packages Etienne Mollier 3 years ago
- patch to provide arguments to lsi test Etienne Mollier 3 years ago
- enabled synplot testing Etienne Mollier 3 years ago
- moved away from patch to use a placeholder pauvre Etienne Mollier 3 years ago
- using more robust absolute python path Etienne Mollier 3 years ago
- added build-deps for build time testing Etienne Mollier 3 years ago
- patch for build test failing to start Etienne Mollier 3 years ago
- p/redwood.py: fixed what looks like a typo Etienne Mollier 3 years ago
- Port remaining Python2 script Andreas Tille 4 years ago
- Initial packaging Andreas Tille 4 years ago
- New upstream version 0.1924 Andreas Tille 4 years ago
0
»»