Codebase list libbioparser-dev / upstream/2.1.1

Tree @upstream/2.1.1 (Download .tar.gz)

# Bioparser

[![Latest GitHub release](](
[![Build status for gcc/clang](](

Bioparser is a c++ implementation of parsers for several bioinformatics formats. It consists of only one header file containing template parsers for FASTA, FASTQ, MHAP, PAF and SAM format. It also supports compressed files with gzip.

## Dependencies
1. gcc 4.8+ or clang 3.4+
2. (optional) cmake 3.2+

## Installation
To build Bioparser unit tests run the following commands:

git clone --recursive bioparser
cd bioparser
mkdir build
cd build
cmake -Dbioparser_build_tests=ON -DCMAKE_BUILD_TYPE=Release ..

After installation, an executable named `bioparser_test` will be created in `build/bin`.

## Usage

If you would like to add bioparser to your project, add the following commands to your CMakeLists.txt file: `add_subdirectory(vendor/bioparser EXCLUDE_FROM_ALL)` and `target_link_libraries(your_exe bioparser)`. If you are not using cmake, include the header `bioparser.hpp` to your project and install zlib on your machine.

For details on how to use the parsers in your code, please look at the examples bellow:

*Note:* Bioparser by default trims sequence headers to the first white space. To disable this behavior pass `false` to the last argument of member functions `bioparser::Parser:parse`.

// define a class for sequences in FASTA format
class Example1 {
    // required signature for the constructor
        const char* name, std::uint32_t name_length,
        const char* sequence, std::uint32_t sequence_length) {
        // your implementation

std::vector<std::unique_ptr<Example1>> fasta_objects;
auto fasta_parser = bioparser::createParser<bioparser::FastaParser, Example1>(path_to_file);
// read the whole file
fasta_parser->parse(fasta_objects, -1);

// define a class for sequences in FASTQ format
class Example2 {
    // required signature for the constructor
        const char* name, std::uint32_t name_length,
        const char* sequence, std::uint32_t sequence_length,
        const char* quality, std::uint32_t quality_length) {
        // your implementation

std::vector<std::unique_ptr<Example2>> fastq_objects;
auto fastq_parser = bioparser::createParser<bioparser::FastqParser, Example2>(path_to_file2);
// read a predefined size of bytes
std::uint64_t size_in_bytes = 500 * 1024 * 1024; // 500 MB
while (true) {
    auto status = fastq_parser->parse(fastq_objects, size_in_bytes);
    // do some work with objects
    if (status == false) {

// define a class for overlaps in MHAP format
class Example3 {
    // required signature for the constructor
        std::uint64_t a_id,
        std::uint64_t b_id,
        double eq_bases_perc,
        std::uint32_t minmers,
        std::uint32_t a_rc,
        std::uint32_t a_begin,
        std::uint32_t a_end,
        std::uint32_t a_length,
        std::uint32_t b_rc,
        std::uint32_t b_begin,
        std::uint32_t b_end,
        std::uint32_t b_length) {
        // your implementation

std::vector<std::unique_ptr<Example3>> mhap_objects;
auto mhap_parser = bioparser::createParser<bioparser::MhapParser, Example3>(path_to_file3);
mhap_parser->parse(mhap_objects, -1);

// define a class for overlaps in PAF format or add a constructor to existing overlap class
    const char* q_name, std::uint32_t q_name_length,
    std::uint32_t q_length,
    std::uint32_t q_begin,
    std::uint32_t q_end,
    char orientation,
    const char* t_name, std::uint32_t t_name_length,
    std::uint32_t t_length,
    std::uint32_t t_begin,
    std::uint32_t t_end,
    std::uint32_t matching_bases,
    std::uint32_t overlap_length,
    std::uint32_t mapping_quality) {
    // your implementation

std::vector<std::unique_ptr<ExampleClass3>> paf_objects;
auto paf_parser = bioparser::createParser<bioparser::PafParser, ExampleClass3>(path_to_file4);
paf_parser->parse(paf_objects, -1);

// define a class for alignments in SAM format
class Example4 {
    // required signature for the constructor
        const char* q_name, std::uint32_t q_name_length,
        std::uint32_t flag,
        const char* t_name, std::uint32_t t_name_length,
        std::uint32_t t_begin,
        std::uint32_t mapping_quality,
        const char* cigar, std::uint32_t cigar_length,
        const char* t_next_name, std::uint32_t t_next_name_length,
        std::uint32_t t_next_begin,
        std::uint32_t template_length,
        const char* sequence, std::uint32_t sequence_length,
        const char* quality, std::uint32_t quality_length) {
        // your implementation

std::vector<std::unique_ptr<Example4>> sam_objects;
auto sam_parser = bioparser::createParser<bioparser::SamParser, Example4>(path_to_file5);
sam_parser->parse(sam_objects, -1);
If your class has a **private** constructor with the required signature, format your classes in the following way:

class Example1 {
    friend bioparser::FastaParser<Example1>;
    Example1(...) {

class Example2 {
    friend bioparser::FastqParser<Example2>;
    Example2(...) {

class Example3 {
    friend bioparser::MhapParser<Example3>;
    friend bioparser::PafParser<Example3>;
    Example3(...) {

class Example4 {
    friend bioparser::SamParser<Example4>;
    Example4(...) {