0 | |
# Roary the pan genome pipeline
|
1 | |
For instructions on how to use the software, the input format and output formats, please see [the Roary website](http://sanger-pathogens.github.io/Roary).
|
2 | |
|
3 | |
[![Build Status](https://travis-ci.org/sanger-pathogens/Roary.svg?branch=master)](https://travis-ci.org/sanger-pathogens/Roary)
|
4 | |
|
|
0 |
# Roary - The pan genome pipeline
|
|
1 |
Takes annotated assemblies in GFF3 format and calculates the pan genome.
|
|
2 |
|
|
3 |
[![Build Status](https://travis-ci.org/sanger-pathogens/Roary.svg?branch=master)](https://travis-ci.org/sanger-pathogens/Roary)
|
|
4 |
[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-brightgreen.svg)](https://github.com/sanger-pathogens/roary/blob/master/GPL-LICENSE)
|
|
5 |
[![status](https://img.shields.io/badge/Bioinformatics-10.1093-brightgreen.svg)](https://academic.oup.com/bioinformatics/article/31/22/3691/240757)
|
|
6 |
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/recipes/roary/README.html)
|
|
7 |
[![Container ready](https://img.shields.io/badge/container-ready-brightgreen.svg)](https://quay.io/repository/biocontainers/roary)
|
|
8 |
[![Docker Build Status](https://img.shields.io/docker/build/sangerpathogens/roary.svg)](https://hub.docker.com/r/sangerpathogens/roary)
|
|
9 |
[![Docker Pulls](https://img.shields.io/docker/pulls/sangerpathogens/roary.svg)](https://hub.docker.com/r/sangerpathogens/roary)
|
|
10 |
[![codecov](https://codecov.io/gh/sanger-pathogens/roary/branch/master/graph/badge.svg)](https://codecov.io/gh/sanger-pathogens/roary)
|
|
11 |
|
|
12 |
## Contents
|
|
13 |
* [Introduction](#introduction)
|
|
14 |
* [Installation](#installation)
|
|
15 |
* [Required dependencies](#required-dependencies)
|
|
16 |
* [Optional dependencies](#optional-dependencies)
|
|
17 |
* [Ubuntu/Debian](#ubuntudebian)
|
|
18 |
* [Debian Testing](#debian-testing)
|
|
19 |
* [Ubuntu 14\.04/16\.04](#ubuntu-14041604)
|
|
20 |
* [Ubuntu 12\.04](#ubuntu-1204)
|
|
21 |
* [Bioconda \- OSX/Linux](#bioconda---osxlinux)
|
|
22 |
* [Galaxy](#galaxy)
|
|
23 |
* [GNU Guix](#gnu-guix)
|
|
24 |
* [Virtual Machine \- OSX/Linux/Windows](#virtual-machine---osxlinuxwindows)
|
|
25 |
* [Docker \- OSX/Linux/Windows/Cloud](#docker---osxlinuxwindowscloud)
|
|
26 |
* [Installing from source (advanced Linux users only)](#installing-from-source-advanced-linux-users-only)
|
|
27 |
* [Ancient systems and versions of perl](#ancient-systems-and-versions-of-perl)
|
|
28 |
* [Versions of software we test against](#versions-of-software-we-test-against)
|
|
29 |
* [Usage](#usage)
|
|
30 |
* [License](#license)
|
|
31 |
* [Feedback/Issues](#feedbackissues)
|
|
32 |
* [Citation](#citation)
|
|
33 |
* [Further Information](#further-information)
|
|
34 |
|
|
35 |
## Introduction
|
5 | 36 |
Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka) and calculates the pan genome. Using a standard desktop PC, it can analyse datasets with thousands of samples, something which is computationally infeasible with existing methods, without compromising the quality of the results. 128 samples can be analysed in under 1 hour using 1 GB of RAM and a single processor. To perform this analysis using existing methods would take weeks and hundreds of GB of RAM.
|
6 | 37 |
|
7 | |
## Citation
|
8 | |
"Roary: Rapid large-scale prokaryote pan genome analysis",
|
9 | |
Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill,
|
10 | |
Bioinformatics, (2015). doi: http://dx.doi.org/10.1093/bioinformatics/btv421
|
11 | |
[Roary: Rapid large-scale prokaryote pan genome analysis](http://dx.doi.org/10.1093/bioinformatics/btv421)
|
12 | |
|
13 | |
# Installation
|
14 | |
Theres are a number of dependancies required for Roary, with instructions specific to the type of system you have:
|
15 | |
* Ubuntu/Debian
|
16 | |
* CentOS/RedHat
|
17 | |
* Bioconda - OSX/Linux
|
18 | |
* Galaxy
|
19 | |
* Guix - Linux
|
20 | |
* Virtual Machine - OSX/Linux/Windows
|
21 | |
* Docker - OSX/Linux/Windows/Cloud
|
22 | |
* Installing from source - OSX/Linux
|
23 | |
|
24 | |
If the installation fails please contact your system administrator. If you encounter a bug please let us know by emailing roary@sanger.ac.uk .
|
25 | |
|
26 | |
## Ubuntu/Debian
|
27 | |
### Debian Testing
|
|
38 |
## Installation
|
|
39 |
Roary has the following dependencies:
|
|
40 |
|
|
41 |
### Required dependencies
|
|
42 |
* [bedtools](https://bedtools.readthedocs.io/en/latest/)
|
|
43 |
* [cd-hit](http://weizhongli-lab.org/cd-hit/)
|
|
44 |
* [ncbi-blast+](https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download)
|
|
45 |
* [mcl](https://micans.org/mcl/)
|
|
46 |
* [parallel](https://www.gnu.org/software/parallel/)
|
|
47 |
* [prank](http://wasabiapp.org/software/prank/)
|
|
48 |
* [mafft](https://mafft.cbrc.jp/alignment/software/)
|
|
49 |
* [fasttree](http://www.microbesonline.org/fasttree/)
|
|
50 |
|
|
51 |
### Optional dependencies
|
|
52 |
* [kraken](http://ccb.jhu.edu/software/kraken/MANUAL.html)
|
|
53 |
|
|
54 |
There are a number of ways to install Roary and details are provided below. If you encounter an issue when installing Roary please contact your local system administrator. If you encounter a bug please log it [here](https://github.com/sanger-pathogens/Roary/issues) or email us at roary-help@sanger.ac.uk.
|
|
55 |
|
|
56 |
### Ubuntu/Debian
|
|
57 |
#### Debian Testing
|
28 | 58 |
```
|
29 | 59 |
sudo apt-get install roary
|
30 | 60 |
```
|
31 | 61 |
|
32 | |
### Ubuntu 14.04/16.04
|
|
62 |
#### Ubuntu 14.04/16.04
|
33 | 63 |
All the dependancies can be installed using apt and cpanm. Root permissions are required. Ubuntu 16.04 contains a package for Roary but it is frozen at v3.6.0.
|
34 | 64 |
|
35 | 65 |
```
|
|
37 | 67 |
sudo cpanm -f Bio::Roary
|
38 | 68 |
```
|
39 | 69 |
|
40 | |
### Ubuntu 12.04
|
|
70 |
#### Ubuntu 12.04
|
41 | 71 |
Some of the software versions in apt are quite old so follow the instructions for Bioconda below.
|
42 | 72 |
|
43 | |
## Bioconda - OSX/Linux
|
|
73 |
### Bioconda - OSX/Linux
|
44 | 74 |
Install conda. Then install bioconda and roary:
|
45 | 75 |
|
46 | 76 |
```
|
|
51 | 81 |
conda install roary
|
52 | 82 |
```
|
53 | 83 |
|
54 | |
## Galaxy
|
55 | |
Roary is available from the Galaxy toolshed ( as is Prokka ).
|
56 | |
|
57 | |
## GNU Guix
|
|
84 |
### Galaxy
|
|
85 |
Roary is available from the Galaxy toolshed (as is Prokka).
|
|
86 |
|
|
87 |
### GNU Guix
|
58 | 88 |
Roary is included in [Guix](https://www.gnu.org/software/guix) and can be installed in the usual way:
|
59 | 89 |
```
|
60 | 90 |
guix package --install roary
|
61 | 91 |
```
|
62 | 92 |
|
63 | |
## Virtual Machine - OSX/Linux/Windows
|
|
93 |
### Virtual Machine - OSX/Linux/Windows
|
64 | 94 |
Roary wont run natively on Windows but we have created virtual machine which has all of the software setup, including Prokka, along with the test datasets from the paper. It is based on [Bio-Linux 8](http://environmentalomics.org/bio-linux/). You need to first install [VirtualBox](https://www.virtualbox.org/), then load the virtual machine, using the 'File -> Import Appliance' menu option. The root password is 'manager'.
|
65 | 95 |
|
66 | 96 |
ftp://ftp.sanger.ac.uk/pub/pathogens/pathogens-vm/pathogens-vm.latest.ova
|
67 | 97 |
|
68 | 98 |
More importantly though, if you're trying to do bioinformatics on Windows, you're not going to get very far and you should seriously consider upgrading to Linux.
|
69 | 99 |
|
70 | |
## Docker - OSX/Linux/Windows/Cloud
|
|
100 |
### Docker - OSX/Linux/Windows/Cloud
|
71 | 101 |
We have a docker container which gets automatically built from the latest version of Roary in Debian Med. To install it:
|
72 | 102 |
|
73 | 103 |
```
|
|
79 | 109 |
docker run --rm -it -v /home/ubuntu/data:/data sangerpathogens/roary roary -f /data /data/*.gff
|
80 | 110 |
```
|
81 | 111 |
|
82 | |
## Installing from source (advanced Linux users only)
|
|
112 |
### Installing from source (advanced Linux users only)
|
83 | 113 |
As a last resort you can install everything from source. This is for users with advanced Linux skills and we do not provide any support with this method since you have the skills to figure things out.
|
84 | 114 |
Download the latest software from (https://github.com/sanger-pathogens/Roary/tarball/master).
|
85 | 115 |
|
|
107 | 137 |
bedtools cd-hit blast mcl GNUparallel prank mafft fasttree
|
108 | 138 |
```
|
109 | 139 |
|
110 | |
## Ancient systems and versions of perl
|
|
140 |
### Ancient systems and versions of perl
|
111 | 141 |
The code will not work with perl 5.8 or below (pre-modern perl). We no longer test against 5.10 (released 2007) or 5.12 (released 2010). If you're running a very old verison of Linux, you're also in trouble.
|
112 | 142 |
|
113 | |
# Versions of software we test against
|
|
143 |
### Versions of software we test against
|
114 | 144 |
* Perl 5.14, 5.26
|
115 | 145 |
* cdhit 4.6.8
|
116 | 146 |
* ncbi blast+ 2.6.0
|
|
119 | 149 |
* prank 140603
|
120 | 150 |
* GNU parallel 20170822, 20160722
|
121 | 151 |
* FastTree 2.1.9
|
|
152 |
|
|
153 |
## Usage
|
|
154 |
```
|
|
155 |
Usage: roary [options] *.gff
|
|
156 |
|
|
157 |
Options: -p INT number of threads [1]
|
|
158 |
-o STR clusters output filename [clustered_proteins]
|
|
159 |
-f STR output directory [.]
|
|
160 |
-e create a multiFASTA alignment of core genes using PRANK
|
|
161 |
-n fast core gene alignment with MAFFT, use with -e
|
|
162 |
-i minimum percentage identity for blastp [95]
|
|
163 |
-cd FLOAT percentage of isolates a gene must be in to be core [99]
|
|
164 |
-qc generate QC report with Kraken
|
|
165 |
-k STR path to Kraken database for QC, use with -qc
|
|
166 |
-a check dependancies and print versions
|
|
167 |
-b STR blastp executable [blastp]
|
|
168 |
-c STR mcl executable [mcl]
|
|
169 |
-d STR mcxdeblast executable [mcxdeblast]
|
|
170 |
-g INT maximum number of clusters [50000]
|
|
171 |
-m STR makeblastdb executable [makeblastdb]
|
|
172 |
-r create R plots, requires R and ggplot2
|
|
173 |
-s dont split paralogs
|
|
174 |
-t INT translation table [11]
|
|
175 |
-ap allow paralogs in core alignment
|
|
176 |
-z dont delete intermediate files
|
|
177 |
-v verbose output to STDOUT
|
|
178 |
-w print version and exit
|
|
179 |
-y add gene inference information to spreadsheet, doesnt work with -e
|
|
180 |
-iv STR Change the MCL inflation value [1.5]
|
|
181 |
-h this help message
|
|
182 |
|
|
183 |
Example: Quickly generate a core gene alignment using 8 threads
|
|
184 |
roary -e --mafft -p 8 *.gff
|
|
185 |
|
|
186 |
For further info see: http://sanger-pathogens.github.io/Roary/
|
|
187 |
```
|
|
188 |
For further instructions on how to use the software, the input format and output formats, please see [the Roary website](http://sanger-pathogens.github.io/Roary).
|
|
189 |
|
|
190 |
## License
|
|
191 |
Roary is free software, licensed under [GPLv3](https://github.com/sanger-pathogens/Roary/blob/master/GPL-LICENSE).
|
|
192 |
|
|
193 |
## Feedback/Issues
|
|
194 |
Please report any issues to the [issues page](https://github.com/sanger-pathogens/Roary/issues) or email roary-help@sanger.ac.uk.
|
|
195 |
|
|
196 |
## Citation
|
|
197 |
If you use this software please cite:
|
|
198 |
|
|
199 |
"Roary: Rapid large-scale prokaryote pan genome analysis",
|
|
200 |
Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill,
|
|
201 |
Bioinformatics, (2015). doi: http://dx.doi.org/10.1093/bioinformatics/btv421
|
|
202 |
[Roary: Rapid large-scale prokaryote pan genome analysis](http://dx.doi.org/10.1093/bioinformatics/btv421)
|
|
203 |
|
|
204 |
## Further Information
|
|
205 |
For more information on this software see:
|
|
206 |
* [The Roary website](http://sanger-pathogens.github.io/Roary)
|
|
207 |
* [The Jupyter notebook tutorial](https://github.com/sanger-pathogens/pathogen-informatics-training)⏎
|