Codebase list getdata / HEAD
HEAD

Tree @HEAD (Download .tar.gz)

getData
=======

Computational biology as a science works without data. Well, but one
cannot prove anything then or just learn from appling the computer to
some new data, then.  There are many reasons why Debian cares about
bioinformatics, but alone for a contribution to the education of our
students at all levels - something we cannot talk enough about, there
needs to be an easy straight forward way to get access to data. For the
professionals then around us, we also need those extra features like
automated updates and whatever else comes to mind to maintain those data
files in our daily routine.

There are now various approaches to it. Many say that there should be data
packages available which are tarballs or some other format of that data to
be installed. This is very reasonable indeed. But alone for the concept
of releases that comes with our and basically every other Linux distro,
this concept is doomed. Nobody wants to use old biological databases
when a new version is available basically at the very same costs.

This package instead sets out to prepare the installation and maintenance
of databases automatically - directly from the data creator's websites
and redistributed as a Debian package.  There may still be packages
with the name of a particular database, like 'uniprot', but those should
then merely provide the instructions to this new 'getData' tool on how
to deal with the database for download, indexing and updating.

The download of the databases listed in getData should all
remain functional. The major challenge is the integration with the
post-processing of the data. In this respect well performing should
already be the swiss.dat (manually curated fraction of the UniProt
protein sequence database) and trembl.dat (automated translating of coding
sequences in the nucleotide sequence database EMBL) entries. For sites
that have the EMBOSS tool kit installed, also the respective indexing
is performed.

Once the package has hit unstable, we shall have the configuration files
with database-specific tools or database install packages.  Please help
with your direct contributions and/or feedback.

'getData' can be successful only when there is a strong communication
among ourselves concerning new tools on the horizon that should possibly
be added. And when there is a new URL for a particular pathway, then
this should be updated in some community effort.  So, please, should at
us by saying "reportbug getdata" whenever there is something to report.
While the development of this tools was indeed seeded by the Debian Med
community, there is some strong hope that folks at Fedora and OpenSuSE
would adopt this package from us. We'll see.

Many thanks!!!

Steffen and Charles

Acknowledgements
----------------

This work was partially supported by the EU FP6 project "KnowARC", that
aimed at preparing computational grids also for smaller research groups
in Bioinformatics. Read more on http://www.knowarc.eu.

-- Steffen Moeller and Charles Plessy <{moeller,plessy}@debian.org>  Fri, 05 Nov 2010 13:58:32 +0100