New Upstream Snapshot - r-cran-rentrez

Ready changes

Summary

Merged new upstream version: 1.2.3+git20201111.1.a225f21+dfsg (was: 1.2.3+dfsg).

Resulting package

Built on 2022-11-18T16:39 (took 8m4s)

The resulting binary packages can be installed (if you have the apt repository enabled) by running one of:

apt install -t fresh-snapshots r-cran-rentrez

Diff

diff --git a/DESCRIPTION b/DESCRIPTION
index a3a82f3..e73654c 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -27,10 +27,8 @@ License: MIT + file LICENSE
 RoxygenNote: 7.1.1
 Encoding: UTF-8
 NeedsCompilation: no
-Packaged: 2020-11-10 20:43:06 UTC; david
+Packaged: 2022-11-18 16:35:44 UTC; root
 Author: David Winter [aut, cre] (<https://orcid.org/0000-0002-6165-0029>),
   Scott Chamberlain [ctb] (<https://orcid.org/0000-0003-1444-9135>),
   Han Guangchun [ctb] (<https://orcid.org/0000-0001-9277-2507>)
 Maintainer: David Winter <david.winter@gmail.com>
-Repository: CRAN
-Date/Publication: 2020-11-10 21:10:02 UTC
diff --git a/MD5 b/MD5
deleted file mode 100644
index 93affce..0000000
--- a/MD5
+++ /dev/null
@@ -1,55 +0,0 @@
-ac49218ba621fe790d568187bf7b9133 *DESCRIPTION
-ce28e10e378a31d3a383fab9edec5b0c *LICENSE
-604e68831f408f3fe00d376c2573ad46 *NAMESPACE
-3fc7bcb779433637b3a6b436fe5f0603 *NEWS
-a917b8bcbc84e6283def96e3373f5b66 *R/api_keys.r
-0d571017465219dce78be61676fb94fe *R/base.r
-777ca4f0d63931ba569becee1f70c521 *R/entrez_citmatch.r
-8298457c212b00c32efc50eb96a50f25 *R/entrez_fetch.r
-8ed15d94393d13bc165abdb8c9f984af *R/entrez_global_query.r
-c9a6ae0f47dd04fe5416f01064acb9ac *R/entrez_info.r
-a7b3cecf18a3f0bd75f208182ffe3476 *R/entrez_link.r
-90d9a4e1354d6bca4d1ca9b99edcc7ff *R/entrez_post.r
-dc241aa8174aa44e5b067cfb2a305dbc *R/entrez_search.r
-4fbb5b52c21a59857a9b547fb315a01b *R/entrez_summary.r
-45e66920a9037c06a69c2842993dd7ed *R/help.r
-4489283bc02cf4b34a4dbd19073ac0fa *R/parse_pubmed_xml.r
-e50c8eb1b4af22f0f25d9f1d23dd5fec *build/vignette.rds
-80141bd00c004447a7e9c87eb2d0cdab *inst/CITATION
-9ed04a54afb1e2af6150c623bf3afea8 *inst/doc/rentrez_tutorial.R
-ad0e3c99b177780640d4148cdd31e6ec *inst/doc/rentrez_tutorial.Rmd
-c235d8e8b736100c4da1c39a5281ea0c *inst/doc/rentrez_tutorial.html
-3faeb3dee943e9d3dcd5a0a6dfba229d *man/entrez_citmatch.Rd
-a43864640cacc1b7f56a8b5e554cee19 *man/entrez_db_links.Rd
-955c8f98820f1d482501d4b400d589e5 *man/entrez_db_searchable.Rd
-5341bbd5e142594f5cc538dd9dd2e00f *man/entrez_db_summary.Rd
-3c8a11b6e03bb2e35b8ce5ee0419395a *man/entrez_dbs.Rd
-7561592c6489323d76001d50953eb324 *man/entrez_fetch.Rd
-063386cbb01aa78c3779a27a7a2759a5 *man/entrez_global_query.Rd
-d343f7cbd8518194ed108cc3f11a7af2 *man/entrez_info.Rd
-12c7823eb119d4887cd2df20ede36534 *man/entrez_link.Rd
-4646c9cc920cd71dd7ee4d97ef77ee55 *man/entrez_post.Rd
-f0423889d6db08ecc204a420d1b2d81c *man/entrez_search.Rd
-c6cb6273fb9cfafa459eac8ad3d2859a *man/entrez_summary.Rd
-610deb84730a5f32b1d8310d830ba918 *man/extract_from_esummary.Rd
-d563be6b5b0c28e82f1f9837b7f33d2c *man/linkout_urls.Rd
-8081b4bfcc61c745e3a678fc5a276086 *man/parse_pubmed_xml.Rd
-5937fa00f82113975519abd74b4dc47b *man/rentrez.Rd
-08509834a23432d8bb70d64f61724d16 *man/set_entrez_key.Rd
-db04e7147a14d952e0ae8c93d1390087 *tests/test-all.R
-e45d1b98bb0b83b5c7149796f1ad265b *tests/testthat/test_api_key.r
-0c4b51d40ae63cbfdcfac31cd67edb96 *tests/testthat/test_citmatch.r
-4edd85844f931fee501b87861537459c *tests/testthat/test_docs.r
-0fcea21cbd133577b1f8254e5d678b35 *tests/testthat/test_fetch.r
-a4c45c8f355eafbc660aede214e6f526 *tests/testthat/test_httr.r
-90a64274ba59f7232cf0adc3cbe8e86e *tests/testthat/test_httr_post.r
-6f1a4c681ca3b43318b45fdf4a87221f *tests/testthat/test_info.r
-f7f1fe31b6a902289daae7fe6e1b3554 *tests/testthat/test_link.r
-506192f73a932bf10c7c7861cc060d8a *tests/testthat/test_net.r
-143fe8e1fe9808bf01f07ff5fc50207e *tests/testthat/test_parse.r
-546ef30875b95832a49c3776420b8e79 *tests/testthat/test_post.r
-ae5359bee5086c5748a01c6a317a8b0a *tests/testthat/test_query.r
-b1cd181fce9fe7127a68c34b145f3b59 *tests/testthat/test_search.r
-1c8a9db4e455e65076f86f2c049e85a5 *tests/testthat/test_summary.r
-c435f7927e6e2f015bf43780f4957e1e *tests/testthat/test_webenv.r
-ad0e3c99b177780640d4148cdd31e6ec *vignettes/rentrez_tutorial.Rmd
diff --git a/NEWS b/NEWS
index 458f5fb..6bb333d 100644
--- a/NEWS
+++ b/NEWS
@@ -1,7 +1,17 @@
 Version 1.2.3
 -----------------
-Maintanence release, mostly to prevent issues with rate-limiting errors when the
-package is tested in CRAN
+Maintenance release, mostly to prevent issues with rate-limiting errors when the
+package is tested in CRAN. 
+
+* The sleep commands for rate-limiting are slightly increased
+
+* As of this release, the vignette is NOT build by default (to avoid issues with
+  automated tests on CRAN). This will not affect most users, but a developers
+  may want to read a wiki page describing how to build the vignette:
+
+https://github.com/ropensci/rentrez/wiki/Building-the-rentrez-tutorial-vignette.
+
+
 
 Version 1.2.2
 ------------------
diff --git a/R/entrez_citmatch.r b/R/entrez_citmatch.r
old mode 100755
new mode 100644
diff --git a/R/entrez_info.r b/R/entrez_info.r
old mode 100755
new mode 100644
diff --git a/build/vignette.rds b/build/vignette.rds
index 0d0df7c..e28b962 100644
Binary files a/build/vignette.rds and b/build/vignette.rds differ
diff --git a/debian/changelog b/debian/changelog
index 4856765..2792e13 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+r-cran-rentrez (1.2.3+git20201111.1.a225f21+dfsg-1) UNRELEASED; urgency=low
+
+  * New upstream snapshot.
+
+ -- Debian Janitor <janitor@jelmer.uk>  Fri, 18 Nov 2022 16:35:57 -0000
+
 r-cran-rentrez (1.2.3+dfsg-1) unstable; urgency=medium
 
   * Team upload.
diff --git a/inst/doc/rentrez_tutorial.R b/inst/doc/rentrez_tutorial.R
index eb3086c..d925c79 100644
--- a/inst/doc/rentrez_tutorial.R
+++ b/inst/doc/rentrez_tutorial.R
@@ -14,182 +14,182 @@ count_recs <- function(db, denom) {
 }
 
 ## ---- dbs---------------------------------------------------------------------
-entrez_dbs()
+#  entrez_dbs()
 
 ## ---- cdd---------------------------------------------------------------------
-entrez_db_summary("cdd")
+#  entrez_db_summary("cdd")
 
 ## ---- sra_eg------------------------------------------------------------------
-entrez_db_searchable("sra")
+#  entrez_db_searchable("sra")
 
 ## ----eg_search----------------------------------------------------------------
-r_search <- entrez_search(db="pubmed", term="R Language")
+#  r_search <- entrez_search(db="pubmed", term="R Language")
 
 ## ----print_search-------------------------------------------------------------
-r_search
+#  r_search
 
 ## ----search_ids---------------------------------------------------------------
-r_search$ids
+#  r_search$ids
 
 ## ----searchids_2--------------------------------------------------------------
-another_r_search <- entrez_search(db="pubmed", term="R Language", retmax=40)
-another_r_search
+#  another_r_search <- entrez_search(db="pubmed", term="R Language", retmax=40)
+#  another_r_search
 
 ## ---- Tt----------------------------------------------------------------------
-entrez_search(db="sra",
-              term="Tetrahymena thermophila[ORGN]",
-              retmax=0)
+#  entrez_search(db="sra",
+#                term="Tetrahymena thermophila[ORGN]",
+#                retmax=0)
 
 ## ---- Tt2---------------------------------------------------------------------
-entrez_search(db="sra",
-              term="Tetrahymena thermophila[ORGN] AND 2013:2015[PDAT]",
-              retmax=0)
+#  entrez_search(db="sra",
+#                term="Tetrahymena thermophila[ORGN] AND 2013:2015[PDAT]",
+#                retmax=0)
 
 ## ---- Tt3---------------------------------------------------------------------
-entrez_search(db="sra",
-              term="(Tetrahymena thermophila[ORGN] OR Tetrahymena borealis[ORGN]) AND 2013:2015[PDAT]",
-              retmax=0)
+#  entrez_search(db="sra",
+#                term="(Tetrahymena thermophila[ORGN] OR Tetrahymena borealis[ORGN]) AND 2013:2015[PDAT]",
+#                retmax=0)
 
 ## ---- sra_searchable----------------------------------------------------------
-entrez_db_searchable("sra")
+#  entrez_db_searchable("sra")
 
 ## ---- mesh--------------------------------------------------------------------
-entrez_search(db   = "pubmed",
-              term = "(vivax malaria[MeSH]) AND (folic acid antagonists[MeSH])")
+#  entrez_search(db   = "pubmed",
+#                term = "(vivax malaria[MeSH]) AND (folic acid antagonists[MeSH])")
 
 ## ---- connectome, fig.width=5, fig.height=4, fig.align='center'---------------
-search_year <- function(year, term){
-    query <- paste(term, "AND (", year, "[PDAT])")
-    entrez_search(db="pubmed", term=query, retmax=0)$count
-}
-
-year <- 2008:2014
-papers <- sapply(year, search_year, term="Connectome", USE.NAMES=FALSE)
-
-plot(year, papers, type='b', main="The Rise of the Connectome")
+#  search_year <- function(year, term){
+#      query <- paste(term, "AND (", year, "[PDAT])")
+#      entrez_search(db="pubmed", term=query, retmax=0)$count
+#  }
+#  
+#  year <- 2008:2014
+#  papers <- sapply(year, search_year, term="Connectome", USE.NAMES=FALSE)
+#  
+#  plot(year, papers, type='b', main="The Rise of the Connectome")
 
 ## ----elink0-------------------------------------------------------------------
-all_the_links <- entrez_link(dbfrom='gene', id=351, db='all')
-all_the_links
+#  all_the_links <- entrez_link(dbfrom='gene', id=351, db='all')
+#  all_the_links
 
 ## ----elink_link---------------------------------------------------------------
-all_the_links$links
+#  all_the_links$links
 
 ## ---- elink_pmc---------------------------------------------------------------
-all_the_links$links$gene_pmc[1:10]
+#  all_the_links$links$gene_pmc[1:10]
 
 ## ---- elink_omim--------------------------------------------------------------
-all_the_links$links$gene_clinvar
-
+#  all_the_links$links$gene_clinvar
+#  
 
 ## ---- elink1------------------------------------------------------------------
-nuc_links <- entrez_link(dbfrom='gene', id=351, db='nuccore')
-nuc_links
-nuc_links$links
+#  nuc_links <- entrez_link(dbfrom='gene', id=351, db='nuccore')
+#  nuc_links
+#  nuc_links$links
 
 ## ---- elinik_refseqs----------------------------------------------------------
-nuc_links$links$gene_nuccore_refseqrna
+#  nuc_links$links$gene_nuccore_refseqrna
 
 ## ---- outlinks----------------------------------------------------------------
-paper_links <- entrez_link(dbfrom="pubmed", id=25500142, cmd="llinks")
-paper_links
+#  paper_links <- entrez_link(dbfrom="pubmed", id=25500142, cmd="llinks")
+#  paper_links
 
 ## ---- urls--------------------------------------------------------------------
-paper_links$linkouts
+#  paper_links$linkouts
 
 ## ----just_urls----------------------------------------------------------------
-linkout_urls(paper_links)
+#  linkout_urls(paper_links)
 
 ## ---- multi_default-----------------------------------------------------------
-all_links_together  <- entrez_link(db="protein", dbfrom="gene", id=c("93100", "223646"))
-all_links_together
-all_links_together$links$gene_protein
+#  all_links_together  <- entrez_link(db="protein", dbfrom="gene", id=c("93100", "223646"))
+#  all_links_together
+#  all_links_together$links$gene_protein
 
 ## ---- multi_byid--------------------------------------------------------------
-all_links_sep  <- entrez_link(db="protein", dbfrom="gene", id=c("93100", "223646"), by_id=TRUE)
-all_links_sep
-lapply(all_links_sep, function(x) x$links$gene_protein)
+#  all_links_sep  <- entrez_link(db="protein", dbfrom="gene", id=c("93100", "223646"), by_id=TRUE)
+#  all_links_sep
+#  lapply(all_links_sep, function(x) x$links$gene_protein)
 
 ## ---- Summ_1------------------------------------------------------------------
-taxize_summ <- entrez_summary(db="pubmed", id=24555091)
-taxize_summ
+#  taxize_summ <- entrez_summary(db="pubmed", id=24555091)
+#  taxize_summ
 
 ## ---- Summ_2------------------------------------------------------------------
-taxize_summ$articleids
+#  taxize_summ$articleids
 
 ## ---- Summ_3------------------------------------------------------------------
-taxize_summ$pmcrefcount
+#  taxize_summ$pmcrefcount
 
 ## ---- multi_summ--------------------------------------------------------------
-vivax_search <- entrez_search(db = "pubmed",
-                              term = "(vivax malaria[MeSH]) AND (folic acid antagonists[MeSH])")
-multi_summs <- entrez_summary(db="pubmed", id=vivax_search$ids)
+#  vivax_search <- entrez_search(db = "pubmed",
+#                                term = "(vivax malaria[MeSH]) AND (folic acid antagonists[MeSH])")
+#  multi_summs <- entrez_summary(db="pubmed", id=vivax_search$ids)
 
 ## ---- multi_summ2-------------------------------------------------------------
-extract_from_esummary(multi_summs, "fulljournalname")
+#  extract_from_esummary(multi_summs, "fulljournalname")
 
 ## ---- multi_summ3-------------------------------------------------------------
-date_and_cite <- extract_from_esummary(multi_summs, c("pubdate", "pmcrefcount",  "title"))
-knitr::kable(head(t(date_and_cite)), row.names=FALSE)
+#  date_and_cite <- extract_from_esummary(multi_summs, c("pubdate", "pmcrefcount",  "title"))
+#  knitr::kable(head(t(date_and_cite)), row.names=FALSE)
 
 ## ---- transcript_ids----------------------------------------------------------
-gene_ids <- c(351, 11647)
-linked_seq_ids <- entrez_link(dbfrom="gene", id=gene_ids, db="nuccore")
-linked_transripts <- linked_seq_ids$links$gene_nuccore_refseqrna
-head(linked_transripts)
+#  gene_ids <- c(351, 11647)
+#  linked_seq_ids <- entrez_link(dbfrom="gene", id=gene_ids, db="nuccore")
+#  linked_transripts <- linked_seq_ids$links$gene_nuccore_refseqrna
+#  head(linked_transripts)
 
 ## ----fetch_fasta--------------------------------------------------------------
-all_recs <- entrez_fetch(db="nuccore", id=linked_transripts, rettype="fasta")
-class(all_recs)
-nchar(all_recs)
+#  all_recs <- entrez_fetch(db="nuccore", id=linked_transripts, rettype="fasta")
+#  class(all_recs)
+#  nchar(all_recs)
 
 ## ---- peak--------------------------------------------------------------------
-cat(strwrap(substr(all_recs, 1, 500)), sep="\n")
+#  cat(strwrap(substr(all_recs, 1, 500)), sep="\n")
 
 ## ---- Tt_tax------------------------------------------------------------------
-Tt <- entrez_search(db="taxonomy", term="(Tetrahymena thermophila[ORGN]) AND Species[RANK]")
-tax_rec <- entrez_fetch(db="taxonomy", id=Tt$ids, rettype="xml", parsed=TRUE)
-class(tax_rec)
+#  Tt <- entrez_search(db="taxonomy", term="(Tetrahymena thermophila[ORGN]) AND Species[RANK]")
+#  tax_rec <- entrez_fetch(db="taxonomy", id=Tt$ids, rettype="xml", parsed=TRUE)
+#  class(tax_rec)
 
 ## ---- Tt_list-----------------------------------------------------------------
-tax_list <- XML::xmlToList(tax_rec)
-tax_list$Taxon$GeneticCode
+#  tax_list <- XML::xmlToList(tax_rec)
+#  tax_list$Taxon$GeneticCode
 
 ## ---- Tt_path-----------------------------------------------------------------
-tt_lineage <- tax_rec["//LineageEx/Taxon/ScientificName"]
-tt_lineage[1:4]
+#  tt_lineage <- tax_rec["//LineageEx/Taxon/ScientificName"]
+#  tt_lineage[1:4]
 
 ## ---- Tt_apply----------------------------------------------------------------
-XML::xpathSApply(tax_rec, "//LineageEx/Taxon/ScientificName", XML::xmlValue)
+#  XML::xpathSApply(tax_rec, "//LineageEx/Taxon/ScientificName", XML::xmlValue)
 
 ## ---- asthma------------------------------------------------------------------
-upload <- entrez_post(db="omim", id=600807)
-upload
+#  upload <- entrez_post(db="omim", id=600807)
+#  upload
 
 ## ---- snail_search------------------------------------------------------------
-entrez_search(db="nuccore", term="COI[Gene] AND Gastropoda[ORGN]")
+#  entrez_search(db="nuccore", term="COI[Gene] AND Gastropoda[ORGN]")
 
 ## ---- snail_history-----------------------------------------------------------
-snail_coi <- entrez_search(db="nuccore", term="COI[Gene] AND Gastropoda[ORGN]", use_history=TRUE)
-snail_coi
-snail_coi$web_history
+#  snail_coi <- entrez_search(db="nuccore", term="COI[Gene] AND Gastropoda[ORGN]", use_history=TRUE)
+#  snail_coi
+#  snail_coi$web_history
 
 ## ---- asthma_links------------------------------------------------------------
-asthma_clinvar <- entrez_link(dbfrom="omim", db="clinvar", cmd="neighbor_history", id=600807)
-asthma_clinvar$web_histories
+#  asthma_clinvar <- entrez_link(dbfrom="omim", db="clinvar", cmd="neighbor_history", id=600807)
+#  asthma_clinvar$web_histories
 
 ## ---- asthma_links_upload-----------------------------------------------------
-asthma_variants <- entrez_link(dbfrom="omim", db="clinvar", cmd="neighbor_history", web_history=upload)
-asthma_variants
+#  asthma_variants <- entrez_link(dbfrom="omim", db="clinvar", cmd="neighbor_history", web_history=upload)
+#  asthma_variants
 
 ## ---- links-------------------------------------------------------------------
-snp_links <- entrez_link(dbfrom="clinvar", db="snp", 
-                         web_history=asthma_variants$web_histories$omim_clinvar,
-                         cmd="neighbor_history")
-snp_summ <- entrez_summary(db="snp", web_history=snp_links$web_histories$clinvar_snp)
-knitr::kable(extract_from_esummary(snp_summ, c("chr", "fxn_class", "global_maf")))
+#  snp_links <- entrez_link(dbfrom="clinvar", db="snp",
+#                           web_history=asthma_variants$web_histories$omim_clinvar,
+#                           cmd="neighbor_history")
+#  snp_summ <- entrez_summary(db="snp", web_history=snp_links$web_histories$clinvar_snp)
+#  knitr::kable(extract_from_esummary(snp_summ, c("chr", "fxn_class", "global_maf")))
 
 ## ---- set_key-----------------------------------------------------------------
-set_entrez_key("ABCD123")
-Sys.getenv("ENTREZ_KEY")
+#  set_entrez_key("ABCD123")
+#  Sys.getenv("ENTREZ_KEY")
 
diff --git a/inst/doc/rentrez_tutorial.html b/inst/doc/rentrez_tutorial.html
new file mode 100644
index 0000000..eb90d6b
--- /dev/null
+++ b/inst/doc/rentrez_tutorial.html
@@ -0,0 +1,976 @@
+<!DOCTYPE html>
+
+<html>
+
+<head>
+
+<meta charset="utf-8" />
+<meta name="generator" content="pandoc" />
+<meta http-equiv="X-UA-Compatible" content="IE=EDGE" />
+
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+
+<meta name="author" content="David winter" />
+
+<meta name="date" content="2022-11-18" />
+
+<title>Rentrez Tutorial</title>
+
+<script>// Pandoc 2.9 adds attributes on both header and div. We remove the former (to
+// be compatible with the behavior of Pandoc < 2.8).
+document.addEventListener('DOMContentLoaded', function(e) {
+  var hs = document.querySelectorAll("div.section[class*='level'] > :first-child");
+  var i, h, a;
+  for (i = 0; i < hs.length; i++) {
+    h = hs[i];
+    if (!/^h[1-6]$/i.test(h.tagName)) continue;  // it should be a header h1-h6
+    a = h.attributes;
+    while (a.length > 0) h.removeAttribute(a[0].name);
+  }
+});
+</script>
+
+<style type="text/css">
+  code{white-space: pre-wrap;}
+  span.smallcaps{font-variant: small-caps;}
+  span.underline{text-decoration: underline;}
+  div.column{display: inline-block; vertical-align: top; width: 50%;}
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+  ul.task-list{list-style: none;}
+    </style>
+
+
+
+<style type="text/css">
+  code {
+    white-space: pre;
+  }
+  .sourceCode {
+    overflow: visible;
+  }
+</style>
+<style type="text/css" data-origin="pandoc">
+pre > code.sourceCode { white-space: pre; position: relative; }
+pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
+pre > code.sourceCode > span:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode > span { color: inherit; text-decoration: inherit; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+pre > code.sourceCode { white-space: pre-wrap; }
+pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
+}
+pre.numberSource code
+  { counter-reset: source-line 0; }
+pre.numberSource code > span
+  { position: relative; left: -4em; counter-increment: source-line; }
+pre.numberSource code > span > a:first-child::before
+  { content: counter(source-line);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+    color: #aaaaaa;
+  }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
+div.sourceCode
+  {   }
+@media screen {
+pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { color: #008000; } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { color: #008000; font-weight: bold; } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+
+</style>
+<script>
+// apply pandoc div.sourceCode style to pre.sourceCode instead
+(function() {
+  var sheets = document.styleSheets;
+  for (var i = 0; i < sheets.length; i++) {
+    if (sheets[i].ownerNode.dataset["origin"] !== "pandoc") continue;
+    try { var rules = sheets[i].cssRules; } catch (e) { continue; }
+    var j = 0;
+    while (j < rules.length) {
+      var rule = rules[j];
+      // check if there is a div.sourceCode rule
+      if (rule.type !== rule.STYLE_RULE || rule.selectorText !== "div.sourceCode") {
+        j++;
+        continue;
+      }
+      var style = rule.style.cssText;
+      // check if color or background-color is set
+      if (rule.style.color === '' && rule.style.backgroundColor === '') {
+        j++;
+        continue;
+      }
+      // replace div.sourceCode by a pre.sourceCode rule
+      sheets[i].deleteRule(j);
+      sheets[i].insertRule('pre.sourceCode{' + style + '}', j);
+    }
+  }
+})();
+</script>
+
+
+
+
+<style type="text/css">body {
+background-color: #fff;
+margin: 1em auto;
+max-width: 700px;
+overflow: visible;
+padding-left: 2em;
+padding-right: 2em;
+font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
+font-size: 14px;
+line-height: 1.35;
+}
+#TOC {
+clear: both;
+margin: 0 0 10px 10px;
+padding: 4px;
+width: 400px;
+border: 1px solid #CCCCCC;
+border-radius: 5px;
+background-color: #f6f6f6;
+font-size: 13px;
+line-height: 1.3;
+}
+#TOC .toctitle {
+font-weight: bold;
+font-size: 15px;
+margin-left: 5px;
+}
+#TOC ul {
+padding-left: 40px;
+margin-left: -1.5em;
+margin-top: 5px;
+margin-bottom: 5px;
+}
+#TOC ul ul {
+margin-left: -2em;
+}
+#TOC li {
+line-height: 16px;
+}
+table {
+margin: 1em auto;
+border-width: 1px;
+border-color: #DDDDDD;
+border-style: outset;
+border-collapse: collapse;
+}
+table th {
+border-width: 2px;
+padding: 5px;
+border-style: inset;
+}
+table td {
+border-width: 1px;
+border-style: inset;
+line-height: 18px;
+padding: 5px 5px;
+}
+table, table th, table td {
+border-left-style: none;
+border-right-style: none;
+}
+table thead, table tr.even {
+background-color: #f7f7f7;
+}
+p {
+margin: 0.5em 0;
+}
+blockquote {
+background-color: #f6f6f6;
+padding: 0.25em 0.75em;
+}
+hr {
+border-style: solid;
+border: none;
+border-top: 1px solid #777;
+margin: 28px 0;
+}
+dl {
+margin-left: 0;
+}
+dl dd {
+margin-bottom: 13px;
+margin-left: 13px;
+}
+dl dt {
+font-weight: bold;
+}
+ul {
+margin-top: 0;
+}
+ul li {
+list-style: circle outside;
+}
+ul ul {
+margin-bottom: 0;
+}
+pre, code {
+background-color: #f7f7f7;
+border-radius: 3px;
+color: #333;
+white-space: pre-wrap; 
+}
+pre {
+border-radius: 3px;
+margin: 5px 0px 10px 0px;
+padding: 10px;
+}
+pre:not([class]) {
+background-color: #f7f7f7;
+}
+code {
+font-family: Consolas, Monaco, 'Courier New', monospace;
+font-size: 85%;
+}
+p > code, li > code {
+padding: 2px 0px;
+}
+div.figure {
+text-align: center;
+}
+img {
+background-color: #FFFFFF;
+padding: 2px;
+border: 1px solid #DDDDDD;
+border-radius: 3px;
+border: 1px solid #CCCCCC;
+margin: 0 5px;
+}
+h1 {
+margin-top: 0;
+font-size: 35px;
+line-height: 40px;
+}
+h2 {
+border-bottom: 4px solid #f7f7f7;
+padding-top: 10px;
+padding-bottom: 2px;
+font-size: 145%;
+}
+h3 {
+border-bottom: 2px solid #f7f7f7;
+padding-top: 10px;
+font-size: 120%;
+}
+h4 {
+border-bottom: 1px solid #f7f7f7;
+margin-left: 8px;
+font-size: 105%;
+}
+h5, h6 {
+border-bottom: 1px solid #ccc;
+font-size: 105%;
+}
+a {
+color: #0033dd;
+text-decoration: none;
+}
+a:hover {
+color: #6666ff; }
+a:visited {
+color: #800080; }
+a:visited:hover {
+color: #BB00BB; }
+a[href^="http:"] {
+text-decoration: underline; }
+a[href^="https:"] {
+text-decoration: underline; }
+
+code > span.kw { color: #555; font-weight: bold; } 
+code > span.dt { color: #902000; } 
+code > span.dv { color: #40a070; } 
+code > span.bn { color: #d14; } 
+code > span.fl { color: #d14; } 
+code > span.ch { color: #d14; } 
+code > span.st { color: #d14; } 
+code > span.co { color: #888888; font-style: italic; } 
+code > span.ot { color: #007020; } 
+code > span.al { color: #ff0000; font-weight: bold; } 
+code > span.fu { color: #900; font-weight: bold; } 
+code > span.er { color: #a61717; background-color: #e3d2d2; } 
+</style>
+
+
+
+
+</head>
+
+<body>
+
+
+
+
+<h1 class="title toc-ignore">Rentrez Tutorial</h1>
+<h4 class="author">David winter</h4>
+<h4 class="date">2022-11-18</h4>
+
+
+<div id="TOC">
+<ul>
+<li><a href="#introduction-the-ncbi-entrez-and-rentrez.">Introduction:
+The NCBI, entrez and <code>rentrez</code>.</a></li>
+<li><a href="#getting-started-with-the-rentrez">Getting started with the
+rentrez</a></li>
+<li><a href="#searching-databases-entrez_search">Searching databases:
+<code>entrez_search()</code></a>
+<ul>
+<li><a href="#building-search-terms">Building search terms</a></li>
+<li><a href="#using-the-filter-field">Using the Filter field</a></li>
+<li><a href="#advanced-counting">Advanced counting</a></li>
+</ul></li>
+<li><a href="#finding-cross-references-entrez_link">Finding
+cross-references : <code>entrez_link()</code>:</a>
+<ul>
+<li><a href="#my-god-its-full-of-links">My god, it’s full of
+links</a></li>
+<li><a href="#narrowing-our-focus">Narrowing our focus</a></li>
+<li><a href="#external-links">External links</a></li>
+<li><a href="#using-more-than-one-id">Using more than one ID</a></li>
+</ul></li>
+<li><a href="#getting-summary-data-entrez_summary">Getting summary data:
+<code>entrez_summary()</code></a>
+<ul>
+<li><a href="#the-summary-record">The summary record</a></li>
+<li><a href="#dealing-with-many-records">Dealing with many
+records</a></li>
+</ul></li>
+<li><a href="#fetching-full-records-entrez_fetch">Fetching full records:
+<code>entrez_fetch()</code></a>
+<ul>
+<li><a href="#fetch-dna-sequences-in-fasta-format">Fetch DNA sequences
+in fasta format</a></li>
+<li><a href="#fetch-a-parsed-xml-document">Fetch a parsed XML
+document</a></li>
+</ul></li>
+<li><a href="#using-ncbis-web-history-features">Using NCBI’s Web History
+features</a>
+<ul>
+<li><a href="#get-a-web_history-object-from-entrez_search-or-entrez_link">Get a
+<code>web_history</code> object from <code>entrez_search</code> or
+<code>entrez_link()</code></a></li>
+<li><a href="#use-a-web_history-object">Use a <code>web_history</code>
+object</a></li>
+</ul></li>
+<li><a href="#rate-limiting-and-api-keys">Rate-limiting and API Keys</a>
+<ul>
+<li><a href="#slowing-rentrez-down-when-you-hit-the-rate-limit">Slowing
+rentrez down when you hit the rate-limit</a></li>
+</ul></li>
+<li><a href="#what-next">What next ?</a></li>
+</ul>
+</div>
+
+<div id="introduction-the-ncbi-entrez-and-rentrez." class="section level2">
+<h2>Introduction: The NCBI, entrez and <code>rentrez</code>.</h2>
+<p>The NCBI shares a <em>lot</em> of data. At the time this document was
+compiled, there were 31.7 million papers in <a href="https://pubmed.ncbi.nlm.nih.gov/">PubMed</a>, including 6.6
+million full-text records available in <a href="https://www.ncbi.nlm.nih.gov/pmc/">PubMed Central</a>. <a href="https://www.ncbi.nlm.nih.gov/nuccore">The NCBI Nucleotide
+Database</a> (which includes GenBank) has data for 432 million different
+sequences, and <a href="https://www.ncbi.nlm.nih.gov/snp/">dbSNP</a>
+describes 702 million different genetic variants. All of these records
+can be cross-referenced with the 1.86 million species in the <a href="https://www.ncbi.nlm.nih.gov/taxonomy">NCBI taxonomy</a> or 27
+thousand disease-associated records in <a href="https://www.ncbi.nlm.nih.gov/omim">OMIM</a>.</p>
+<p>The NCBI makes this data available through a <a href="https://www.ncbi.nlm.nih.gov/">web interface</a>, an <a href="ftp://ftp.ncbi.nlm.nih.gov/">FTP server</a> and through a REST API
+called the <a href="https://www.ncbi.nlm.nih.gov/books/NBK25500/">Entrez
+Utilities</a> (<code>Eutils</code> for short). This package provides
+functions to use that API, allowing users to gather and combine data
+from multiple NCBI databases in the comfort of an R session or
+script.</p>
+</div>
+<div id="getting-started-with-the-rentrez" class="section level2">
+<h2>Getting started with the rentrez</h2>
+<p>To make the most of all the data the NCBI shares you need to know a
+little about their databases, the records they contain and the ways you
+can find those records. The <a href="https://www.ncbi.nlm.nih.gov/home/documentation/">NCBI provides
+extensive documentation for each of their databases</a> and for the <a href="https://www.ncbi.nlm.nih.gov/books/NBK25501/">EUtils API that
+<code>rentrez</code> takes advantage of</a>. There are also some helper
+functions in <code>rentrez</code> that help users learn their way around
+the NCBI’s databases.</p>
+<p>First, you can use <code>entrez_dbs()</code> to find the list of
+available databases:</p>
+<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">entrez_dbs</span>()</span></code></pre></div>
+<p>There is a set of functions with names starting
+<code>entrez_db_</code> that can be used to gather more information
+about each of these databases:</p>
+<p><strong>Functions that help you learn about NCBI
+databases</strong></p>
+<table>
+<colgroup>
+<col width="32%" />
+<col width="67%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th>Function name</th>
+<th>Return</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td><code>entrez_db_summary()</code></td>
+<td>Brief description of what the database is</td>
+</tr>
+<tr class="even">
+<td><code>entrez_db_searchable()</code></td>
+<td>Set of search terms that can used with this database</td>
+</tr>
+<tr class="odd">
+<td><code>entrez_db_links()</code></td>
+<td>Set of databases that might contain linked records</td>
+</tr>
+</tbody>
+</table>
+<p>For instance, we can get a description of the somewhat cryptically
+named database ‘cdd’…</p>
+<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">entrez_db_summary</span>(<span class="st">&quot;cdd&quot;</span>)</span></code></pre></div>
+<p>… or find out which search terms can be used with the Sequence Read
+Archive (SRA) database (which contains raw data from sequencing
+projects):</p>
+<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">entrez_db_searchable</span>(<span class="st">&quot;sra&quot;</span>)</span></code></pre></div>
+<p>Just how these ‘helper’ functions might be useful will become clearer
+once you’ve started using <code>rentrez</code>, so let’s get
+started.</p>
+</div>
+<div id="searching-databases-entrez_search" class="section level2">
+<h2>Searching databases: <code>entrez_search()</code></h2>
+<p>Very often, the first thing you’ll want to do with
+<code>rentrez</code> is search a given NCBI database to find records
+that match some keywords. You can do this using the function
+<code>entrez_search()</code>. In the simplest case you just need to
+provide a database name (<code>db</code>) and a search term
+(<code>term</code>) so let’s search PubMed for articles about the
+<code>R language</code>:</p>
+<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>r_search <span class="ot">&lt;-</span> <span class="fu">entrez_search</span>(<span class="at">db=</span><span class="st">&quot;pubmed&quot;</span>, <span class="at">term=</span><span class="st">&quot;R Language&quot;</span>)</span></code></pre></div>
+<p>The object returned by a search acts like a list, and you can get a
+summary of its contents by printing it.</p>
+<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>r_search</span></code></pre></div>
+<p>There are a few things to note here. First, the NCBI’s server has
+worked out that we meant R as a programming language, and so included
+the <a href="https://www.ncbi.nlm.nih.gov/mesh">‘MeSH’ term</a> term
+associated with programming languages. We’ll worry about MeSH terms and
+other special queries later, for now just note that you can use this
+feature to check that your search term was interpreted in the way you
+intended. Second, there are many more ‘hits’ for this search than there
+are unique IDs contained in this object. That’s because the optional
+argument <code>retmax</code>, which controls the maximum number of
+returned values has a default value of 20.</p>
+<p>The IDs are the most important thing returned here. They allow us to
+fetch records matching those IDs, gather summary data about them or find
+cross-referenced records in other databases. We access the IDs as a
+vector using the <code>$</code> operator:</p>
+<div class="sourceCode" id="cb6"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>r_search<span class="sc">$</span>ids</span></code></pre></div>
+<p>If we want to get more than 20 IDs we can do so by increasing the
+<code>ret_max</code> argument.</p>
+<div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>another_r_search <span class="ot">&lt;-</span> <span class="fu">entrez_search</span>(<span class="at">db=</span><span class="st">&quot;pubmed&quot;</span>, <span class="at">term=</span><span class="st">&quot;R Language&quot;</span>, <span class="at">retmax=</span><span class="dv">40</span>)</span>
+<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>another_r_search</span></code></pre></div>
+<p>If we want to get IDs for all of the thousands of records that match
+this search, we can use the NCBI’s web history feature <a href="#web_history">described below</a>.</p>
+<div id="building-search-terms" class="section level3">
+<h3>Building search terms</h3>
+<p>The EUtils API uses a special syntax to build search terms. You can
+search a database against a specific term using the format
+<code>query[SEARCH FIELD]</code>, and combine multiple such searches
+using the boolean operators <code>AND</code>, <code>OR</code> and
+<code>NOT</code>.</p>
+<p>For instance, we can find next generation sequence datasets for the
+(amazing…) ciliate <em>Tetrahymena thermophila</em> by using the
+organism (‘ORGN’) search field:</p>
+<div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="fu">entrez_search</span>(<span class="at">db=</span><span class="st">&quot;sra&quot;</span>,</span>
+<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>              <span class="at">term=</span><span class="st">&quot;Tetrahymena thermophila[ORGN]&quot;</span>,</span>
+<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>              <span class="at">retmax=</span><span class="dv">0</span>)</span></code></pre></div>
+<p>We can narrow our focus to only those records that have been added
+recently (using the colon to specify a range of values):</p>
+<div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="fu">entrez_search</span>(<span class="at">db=</span><span class="st">&quot;sra&quot;</span>,</span>
+<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>              <span class="at">term=</span><span class="st">&quot;Tetrahymena thermophila[ORGN] AND 2013:2015[PDAT]&quot;</span>,</span>
+<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>              <span class="at">retmax=</span><span class="dv">0</span>)</span></code></pre></div>
+<p>Or include recent records for either <em>T. thermophila</em> or it’s
+close relative <em>T. borealis</em> (using parentheses to make ANDs and
+ORs explicit).</p>
+<div class="sourceCode" id="cb10"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">entrez_search</span>(<span class="at">db=</span><span class="st">&quot;sra&quot;</span>,</span>
+<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>              <span class="at">term=</span><span class="st">&quot;(Tetrahymena thermophila[ORGN] OR Tetrahymena borealis[ORGN]) AND 2013:2015[PDAT]&quot;</span>,</span>
+<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>              <span class="at">retmax=</span><span class="dv">0</span>)</span></code></pre></div>
+<p>The set of search terms available varies between databases. You can
+get a list of available terms or any given data base with
+<code>entrez_db_searchable()</code></p>
+<div class="sourceCode" id="cb11"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="fu">entrez_db_searchable</span>(<span class="st">&quot;sra&quot;</span>)</span></code></pre></div>
+</div>
+<div id="using-the-filter-field" class="section level3">
+<h3>Using the Filter field</h3>
+<p>“Filter” is a special field that, as the names suggests, allows you
+to limit records returned by a search to set of filtering criteria.
+There is no programmatic way to find the particular terms that can be
+used with the Filter field. However, the NCBI’s website provides an
+“advanced search” tool for some databases that can be used to discover
+these terms.</p>
+<p>For example, to find the list of possible to find all of the terms
+that can be used to filter searches to the nucleotide database using the
+<a href="https://www.ncbi.nlm.nih.gov/nuccore/advanced">advanced search
+for that databse</a>. On that page selecting “Filter” from the first
+drop-down box then clicking “Show index list” will allow the user to
+scroll through possible filtering terms.</p>
+<p>###Precise queries using MeSH terms</p>
+<p>In addition to the search terms described above, the NCBI allows
+searches using <a href="https://www.ncbi.nlm.nih.gov/mesh">Medical
+Subject Heading (MeSH)</a> terms. These terms create a ‘controlled
+vocabulary’, and allow users to make very finely controlled queries of
+databases.</p>
+<p>For instance, if you were interested in reviewing studies on how a
+class of anti-malarial drugs called Folic Acid Antagonists work against
+<em>Plasmodium vivax</em> (a particular species of malarial parasite),
+you could use this search:</p>
+<div class="sourceCode" id="cb12"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="fu">entrez_search</span>(<span class="at">db   =</span> <span class="st">&quot;pubmed&quot;</span>,</span>
+<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>              <span class="at">term =</span> <span class="st">&quot;(vivax malaria[MeSH]) AND (folic acid antagonists[MeSH])&quot;</span>)</span></code></pre></div>
+<p>The complete set of MeSH terms is available as a database from the
+NCBI. That means it is possible to download detailed information about
+each term and find the ways in which terms relate to each other using
+<code>rentrez</code>. You can search for specific terms with
+<code>entrez_search(db=&quot;mesh&quot;, term =...)</code> and learn about the
+results of your search using the tools described below.</p>
+</div>
+<div id="advanced-counting" class="section level3">
+<h3>Advanced counting</h3>
+<p>As you can see above, the object returned by
+<code>entrez_search()</code> includes the number of records matching a
+given search. This means you can learn a little about the composition
+of, or trends in, the records stored in the NCBI’s databases using only
+the search utility. For instance, let’s track the rise of the scientific
+buzzword “connectome” in PubMed, programmatically creating search terms
+for the <code>PDAT</code> field:</p>
+<div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>search_year <span class="ot">&lt;-</span> <span class="cf">function</span>(year, term){</span>
+<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>    query <span class="ot">&lt;-</span> <span class="fu">paste</span>(term, <span class="st">&quot;AND (&quot;</span>, year, <span class="st">&quot;[PDAT])&quot;</span>)</span>
+<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>    <span class="fu">entrez_search</span>(<span class="at">db=</span><span class="st">&quot;pubmed&quot;</span>, <span class="at">term=</span>query, <span class="at">retmax=</span><span class="dv">0</span>)<span class="sc">$</span>count</span>
+<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a>}</span>
+<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a>year <span class="ot">&lt;-</span> <span class="dv">2008</span><span class="sc">:</span><span class="dv">2014</span></span>
+<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a>papers <span class="ot">&lt;-</span> <span class="fu">sapply</span>(year, search_year, <span class="at">term=</span><span class="st">&quot;Connectome&quot;</span>, <span class="at">USE.NAMES=</span><span class="cn">FALSE</span>)</span>
+<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(year, papers, <span class="at">type=</span><span class="st">&#39;b&#39;</span>, <span class="at">main=</span><span class="st">&quot;The Rise of the Connectome&quot;</span>)</span></code></pre></div>
+</div>
+</div>
+<div id="finding-cross-references-entrez_link" class="section level2">
+<h2>Finding cross-references : <code>entrez_link()</code>:</h2>
+<p>One of the strengths of the NCBI databases is the degree to which
+records of one type are connected to other records within the NCBI or to
+external data sources. The function <code>entrez_link()</code> allows
+users to discover these links between records.</p>
+<div id="my-god-its-full-of-links" class="section level3">
+<h3>My god, it’s full of links</h3>
+<p>To get an idea of the degree to which records in the NCBI are
+cross-linked we can find all NCBI data associated with a single gene (in
+this case the Amyloid Beta Precursor gene, the product of which is
+associated with the plaques that form in the brains of Alzheimer’s
+Disease patients).</p>
+<p>The function <code>entrez_link()</code> can be used to find
+cross-referenced records. In the most basic case we need to provide an
+ID (<code>id</code>), the database from which this ID comes
+(<code>dbfrom</code>) and the name of a database in which to find linked
+records (<code>db</code>). If we set this last argument to ‘all’ we can
+find links in multiple databases:</p>
+<div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>all_the_links <span class="ot">&lt;-</span> <span class="fu">entrez_link</span>(<span class="at">dbfrom=</span><span class="st">&#39;gene&#39;</span>, <span class="at">id=</span><span class="dv">351</span>, <span class="at">db=</span><span class="st">&#39;all&#39;</span>)</span>
+<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>all_the_links</span></code></pre></div>
+<p>Just as with <code>entrez_search</code> the returned object behaves
+like a list, and we can learn a little about its contents by printing
+it. In the case, all of the information is in <code>links</code> (and
+there’s a lot of them!):</p>
+<div class="sourceCode" id="cb15"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>all_the_links<span class="sc">$</span>links</span></code></pre></div>
+<p>The names of the list elements are in the format
+<code>[source_database]_[linked_database]</code> and the elements
+themselves contain a vector of linked-IDs. So, if we want to find open
+access publications associated with this gene we could get linked
+records in PubMed Central:</p>
+<div class="sourceCode" id="cb16"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a>all_the_links<span class="sc">$</span>links<span class="sc">$</span>gene_pmc[<span class="dv">1</span><span class="sc">:</span><span class="dv">10</span>]</span></code></pre></div>
+<p>Or if were interested in this gene’s role in diseases we could find
+links to clinVar:</p>
+<div class="sourceCode" id="cb17"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a>all_the_links<span class="sc">$</span>links<span class="sc">$</span>gene_clinvar</span></code></pre></div>
+</div>
+<div id="narrowing-our-focus" class="section level3">
+<h3>Narrowing our focus</h3>
+<p>If we know beforehand what sort of links we’d like to find , we can
+to use the <code>db</code> argument to narrow the focus of a call to
+<code>entrez_link</code>.</p>
+<p>For instance, say we are interested in knowing about all of the RNA
+transcripts associated with the Amyloid Beta Precursor gene in humans.
+Transcript sequences are stored in the nucleotide database (referred to
+as <code>nuccore</code> in EUtils), so to find transcripts associated
+with a given gene we need to set <code>dbfrom=gene</code> and
+<code>db=nuccore</code>.</p>
+<div class="sourceCode" id="cb18"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a>nuc_links <span class="ot">&lt;-</span> <span class="fu">entrez_link</span>(<span class="at">dbfrom=</span><span class="st">&#39;gene&#39;</span>, <span class="at">id=</span><span class="dv">351</span>, <span class="at">db=</span><span class="st">&#39;nuccore&#39;</span>)</span>
+<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>nuc_links</span>
+<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a>nuc_links<span class="sc">$</span>links</span></code></pre></div>
+<p>The object we get back contains links to the nucleotide database
+generally, but also to special subsets of that database like <a href="https://www.ncbi.nlm.nih.gov/refseq/">refseq</a>. We can take
+advantage of this narrower set of links to find IDs that match unique
+transcripts from our gene of interest.</p>
+<div class="sourceCode" id="cb19"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a>nuc_links<span class="sc">$</span>links<span class="sc">$</span>gene_nuccore_refseqrna</span></code></pre></div>
+<p>We can use these ids in calls to <code>entrez_fetch()</code> or
+<code>entrez_summary()</code> to learn more about the transcripts they
+represent.</p>
+</div>
+<div id="external-links" class="section level3">
+<h3>External links</h3>
+<p>In addition to finding data within the NCBI, <code>entrez_link</code>
+can turn up connections to external databases. Perhaps the most
+interesting example is finding links to the full text of papers in
+PubMed. For example, when I wrote this document the first paper linked
+to Amyloid Beta Precursor had a unique ID of <code>25500142</code>. We
+can find links to the full text of that paper with
+<code>entrez_link</code> by setting the <code>cmd</code> argument to
+‘llinks’:</p>
+<div class="sourceCode" id="cb20"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a>paper_links <span class="ot">&lt;-</span> <span class="fu">entrez_link</span>(<span class="at">dbfrom=</span><span class="st">&quot;pubmed&quot;</span>, <span class="at">id=</span><span class="dv">25500142</span>, <span class="at">cmd=</span><span class="st">&quot;llinks&quot;</span>)</span>
+<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>paper_links</span></code></pre></div>
+<p>Each element of the <code>linkouts</code> object contains information
+about an external source of data on this paper:</p>
+<div class="sourceCode" id="cb21"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a>paper_links<span class="sc">$</span>linkouts</span></code></pre></div>
+<p>Each of those linkout objects contains quite a lot of information,
+but the URL is probably the most useful. For that reason,
+<code>rentrez</code> provides the function <code>linkout_urls</code> to
+make extracting just the URL simple:</p>
+<div class="sourceCode" id="cb22"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="fu">linkout_urls</span>(paper_links)</span></code></pre></div>
+<p>The full list of options for the <code>cmd</code> argument are given
+in in-line documentation (<code>?entrez_link</code>). If you are
+interested in finding full text records for a large number of articles
+checkout the package <a href="https://github.com/ropensci/fulltext">fulltext</a> which makes use
+of multiple sources (including the NCBI) to discover the full text
+articles.</p>
+</div>
+<div id="using-more-than-one-id" class="section level3">
+<h3>Using more than one ID</h3>
+<p>It is possible to pass more than one ID to
+<code>entrez_link()</code>. By default, doing so will give you a single
+elink object containing the complete set of links for <em>all</em> of
+the IDs that you specified. So, if you were looking for protein IDs
+related to specific genes you could do:</p>
+<div class="sourceCode" id="cb23"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a>all_links_together  <span class="ot">&lt;-</span> <span class="fu">entrez_link</span>(<span class="at">db=</span><span class="st">&quot;protein&quot;</span>, <span class="at">dbfrom=</span><span class="st">&quot;gene&quot;</span>, <span class="at">id=</span><span class="fu">c</span>(<span class="st">&quot;93100&quot;</span>, <span class="st">&quot;223646&quot;</span>))</span>
+<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a>all_links_together</span>
+<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a>all_links_together<span class="sc">$</span>links<span class="sc">$</span>gene_protein</span></code></pre></div>
+<p>Although this behaviour might sometimes be useful, it means we’ve
+lost track of which <code>protein</code> ID is linked to which
+<code>gene</code> ID. To retain that information we can set
+<code>by_id</code> to <code>TRUE</code>. This gives us a list of elink
+objects, each once containing links from a single <code>gene</code>
+ID:</p>
+<div class="sourceCode" id="cb24"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a>all_links_sep  <span class="ot">&lt;-</span> <span class="fu">entrez_link</span>(<span class="at">db=</span><span class="st">&quot;protein&quot;</span>, <span class="at">dbfrom=</span><span class="st">&quot;gene&quot;</span>, <span class="at">id=</span><span class="fu">c</span>(<span class="st">&quot;93100&quot;</span>, <span class="st">&quot;223646&quot;</span>), <span class="at">by_id=</span><span class="cn">TRUE</span>)</span>
+<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>all_links_sep</span>
+<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a><span class="fu">lapply</span>(all_links_sep, <span class="cf">function</span>(x) x<span class="sc">$</span>links<span class="sc">$</span>gene_protein)</span></code></pre></div>
+</div>
+</div>
+<div id="getting-summary-data-entrez_summary" class="section level2">
+<h2>Getting summary data: <code>entrez_summary()</code></h2>
+<p>Having found the unique IDs for some records via
+<code>entrez_search</code> or <code>entrez_link()</code>, you are
+probably going to want to learn something about them. The
+<code>Eutils</code> API has two ways to get information about a record.
+<code>entrez_fetch()</code> returns ‘full’ records in varying formats
+and <code>entrez_summary()</code> returns less information about each
+record, but in relatively simple format. Very often the summary records
+have the information you are after, so <code>rentrez</code> provides
+functions to parse and summarise summary records.</p>
+<div id="the-summary-record" class="section level3">
+<h3>The summary record</h3>
+<p><code>entrez_summary()</code> takes a vector of unique IDs for the
+samples you want to get summary information from. Let’s start by finding
+out something about the paper describing <a href="https://github.com/ropensci/taxize">Taxize</a>, using its PubMed
+ID:</p>
+<div class="sourceCode" id="cb25"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a>taxize_summ <span class="ot">&lt;-</span> <span class="fu">entrez_summary</span>(<span class="at">db=</span><span class="st">&quot;pubmed&quot;</span>, <span class="at">id=</span><span class="dv">24555091</span>)</span>
+<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a>taxize_summ</span></code></pre></div>
+<p>Once again, the object returned by <code>entrez_summary</code>
+behaves like a list, so you can extract elements using <code>$</code>.
+For instance, we could convert our PubMed ID to another article
+identifier…</p>
+<div class="sourceCode" id="cb26"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a>taxize_summ<span class="sc">$</span>articleids</span></code></pre></div>
+<p>…or see how many times the article has been cited in PubMed Central
+papers</p>
+<div class="sourceCode" id="cb27"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a>taxize_summ<span class="sc">$</span>pmcrefcount</span></code></pre></div>
+</div>
+<div id="dealing-with-many-records" class="section level3">
+<h3>Dealing with many records</h3>
+<p>If you give <code>entrez_summary()</code> a vector with more than one
+ID you’ll get a list of summary records back. Let’s get those
+<em>Plasmodium vivax</em> papers we found in the
+<code>entrez_search()</code> section back, and fetch some summary data
+on each paper:</p>
+<div class="sourceCode" id="cb28"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a>vivax_search <span class="ot">&lt;-</span> <span class="fu">entrez_search</span>(<span class="at">db =</span> <span class="st">&quot;pubmed&quot;</span>,</span>
+<span id="cb28-2"><a href="#cb28-2" aria-hidden="true" tabindex="-1"></a>                              <span class="at">term =</span> <span class="st">&quot;(vivax malaria[MeSH]) AND (folic acid antagonists[MeSH])&quot;</span>)</span>
+<span id="cb28-3"><a href="#cb28-3" aria-hidden="true" tabindex="-1"></a>multi_summs <span class="ot">&lt;-</span> <span class="fu">entrez_summary</span>(<span class="at">db=</span><span class="st">&quot;pubmed&quot;</span>, <span class="at">id=</span>vivax_search<span class="sc">$</span>ids)</span></code></pre></div>
+<p><code>rentrez</code> provides a helper function,
+<code>extract_from_esummary()</code> that takes one or more elements
+from every summary record in one of these lists. Here it is working with
+one…</p>
+<div class="sourceCode" id="cb29"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a><span class="fu">extract_from_esummary</span>(multi_summs, <span class="st">&quot;fulljournalname&quot;</span>)</span></code></pre></div>
+<p>… and several elements:</p>
+<div class="sourceCode" id="cb30"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb30-1"><a href="#cb30-1" aria-hidden="true" tabindex="-1"></a>date_and_cite <span class="ot">&lt;-</span> <span class="fu">extract_from_esummary</span>(multi_summs, <span class="fu">c</span>(<span class="st">&quot;pubdate&quot;</span>, <span class="st">&quot;pmcrefcount&quot;</span>,  <span class="st">&quot;title&quot;</span>))</span>
+<span id="cb30-2"><a href="#cb30-2" aria-hidden="true" tabindex="-1"></a>knitr<span class="sc">::</span><span class="fu">kable</span>(<span class="fu">head</span>(<span class="fu">t</span>(date_and_cite)), <span class="at">row.names=</span><span class="cn">FALSE</span>)</span></code></pre></div>
+</div>
+</div>
+<div id="fetching-full-records-entrez_fetch" class="section level2">
+<h2>Fetching full records: <code>entrez_fetch()</code></h2>
+<p>As useful as the summary records are, sometimes they just don’t have
+the information that you need. If you want a complete representation of
+a record you can use <code>entrez_fetch</code>, using the argument
+<code>rettype</code> to specify the format you’d like the record in.</p>
+<div id="fetch-dna-sequences-in-fasta-format" class="section level3">
+<h3>Fetch DNA sequences in fasta format</h3>
+<p>Let’s extend the example given in the <code>entrez_link()</code>
+section about finding transcript for a given gene. This time we will
+fetch cDNA sequences of those transcripts.We can start by repeating the
+steps in the earlier example to get nucleotide IDs for refseq
+transcripts of two genes:</p>
+<div class="sourceCode" id="cb31"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb31-1"><a href="#cb31-1" aria-hidden="true" tabindex="-1"></a>gene_ids <span class="ot">&lt;-</span> <span class="fu">c</span>(<span class="dv">351</span>, <span class="dv">11647</span>)</span>
+<span id="cb31-2"><a href="#cb31-2" aria-hidden="true" tabindex="-1"></a>linked_seq_ids <span class="ot">&lt;-</span> <span class="fu">entrez_link</span>(<span class="at">dbfrom=</span><span class="st">&quot;gene&quot;</span>, <span class="at">id=</span>gene_ids, <span class="at">db=</span><span class="st">&quot;nuccore&quot;</span>)</span>
+<span id="cb31-3"><a href="#cb31-3" aria-hidden="true" tabindex="-1"></a>linked_transripts <span class="ot">&lt;-</span> linked_seq_ids<span class="sc">$</span>links<span class="sc">$</span>gene_nuccore_refseqrna</span>
+<span id="cb31-4"><a href="#cb31-4" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span>(linked_transripts)</span></code></pre></div>
+<p>Now we can get our sequences with <code>entrez_fetch</code>, setting
+<code>rettype</code> to “fasta” (the list of formats available for <a href="https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/">each
+database is give in this table</a>):</p>
+<div class="sourceCode" id="cb32"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb32-1"><a href="#cb32-1" aria-hidden="true" tabindex="-1"></a>all_recs <span class="ot">&lt;-</span> <span class="fu">entrez_fetch</span>(<span class="at">db=</span><span class="st">&quot;nuccore&quot;</span>, <span class="at">id=</span>linked_transripts, <span class="at">rettype=</span><span class="st">&quot;fasta&quot;</span>)</span>
+<span id="cb32-2"><a href="#cb32-2" aria-hidden="true" tabindex="-1"></a><span class="fu">class</span>(all_recs)</span>
+<span id="cb32-3"><a href="#cb32-3" aria-hidden="true" tabindex="-1"></a><span class="fu">nchar</span>(all_recs)</span></code></pre></div>
+<p>Congratulations, now you have a really huge character vector! Rather
+than printing all those thousands of bases we can take a peak at the top
+of the file:</p>
+<div class="sourceCode" id="cb33"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb33-1"><a href="#cb33-1" aria-hidden="true" tabindex="-1"></a><span class="fu">cat</span>(<span class="fu">strwrap</span>(<span class="fu">substr</span>(all_recs, <span class="dv">1</span>, <span class="dv">500</span>)), <span class="at">sep=</span><span class="st">&quot;</span><span class="sc">\n</span><span class="st">&quot;</span>)</span></code></pre></div>
+<p>If we wanted to use these sequences in some other application we
+could write them to file:</p>
+<div class="sourceCode" id="cb34"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb34-1"><a href="#cb34-1" aria-hidden="true" tabindex="-1"></a><span class="fu">write</span>(all_recs, <span class="at">file=</span><span class="st">&quot;my_transcripts.fasta&quot;</span>)</span></code></pre></div>
+<p>Alternatively, if you want to use them within an R session<br />
+we could write them to a temporary file then read that. In this case I’m
+using <code>read.dna()</code> from the pylogenetics package ape (but not
+executing the code block in this vignette, so you don’t have to install
+that package):</p>
+<div class="sourceCode" id="cb35"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb35-1"><a href="#cb35-1" aria-hidden="true" tabindex="-1"></a>temp <span class="ot">&lt;-</span> <span class="fu">tempfile</span>()</span>
+<span id="cb35-2"><a href="#cb35-2" aria-hidden="true" tabindex="-1"></a><span class="fu">write</span>(all_recs, temp)</span>
+<span id="cb35-3"><a href="#cb35-3" aria-hidden="true" tabindex="-1"></a>parsed_recs <span class="ot">&lt;-</span> ape<span class="sc">::</span><span class="fu">read.dna</span>(all_recs, temp)</span></code></pre></div>
+</div>
+<div id="fetch-a-parsed-xml-document" class="section level3">
+<h3>Fetch a parsed XML document</h3>
+<p>Most of the NCBI’s databases can return records in XML format. In
+additional to downloading the text-representation of these files,
+<code>entrez_fetch()</code> can return objects parsed by the
+<code>XML</code> package. As an example, we can check out the Taxonomy
+database’s record for (did I mention they are amazing….) <em>Tetrahymena
+thermophila</em>, specifying we want the result to be parsed by setting
+<code>parsed=TRUE</code>:</p>
+<div class="sourceCode" id="cb36"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb36-1"><a href="#cb36-1" aria-hidden="true" tabindex="-1"></a>Tt <span class="ot">&lt;-</span> <span class="fu">entrez_search</span>(<span class="at">db=</span><span class="st">&quot;taxonomy&quot;</span>, <span class="at">term=</span><span class="st">&quot;(Tetrahymena thermophila[ORGN]) AND Species[RANK]&quot;</span>)</span>
+<span id="cb36-2"><a href="#cb36-2" aria-hidden="true" tabindex="-1"></a>tax_rec <span class="ot">&lt;-</span> <span class="fu">entrez_fetch</span>(<span class="at">db=</span><span class="st">&quot;taxonomy&quot;</span>, <span class="at">id=</span>Tt<span class="sc">$</span>ids, <span class="at">rettype=</span><span class="st">&quot;xml&quot;</span>, <span class="at">parsed=</span><span class="cn">TRUE</span>)</span>
+<span id="cb36-3"><a href="#cb36-3" aria-hidden="true" tabindex="-1"></a><span class="fu">class</span>(tax_rec)</span></code></pre></div>
+<p>The package XML (which you have if you have installed
+<code>rentrez</code>) provides functions to get information from these
+files. For relatively simple records like this one you can use
+<code>XML::xmlToList</code>:</p>
+<div class="sourceCode" id="cb37"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb37-1"><a href="#cb37-1" aria-hidden="true" tabindex="-1"></a>tax_list <span class="ot">&lt;-</span> XML<span class="sc">::</span><span class="fu">xmlToList</span>(tax_rec)</span>
+<span id="cb37-2"><a href="#cb37-2" aria-hidden="true" tabindex="-1"></a>tax_list<span class="sc">$</span>Taxon<span class="sc">$</span>GeneticCode</span></code></pre></div>
+<p>For more complex records, which generate deeply-nested lists, you can
+use <a href="https://en.wikipedia.org/wiki/XPath">XPath expressions</a>
+along with the function <code>XML::xpathSApply</code> or the extraction
+operatord <code>[</code> and <code>[[</code> to extract specific parts
+of the file. For instance, we can get the scientific name of each taxon
+in <em>T. thermophila</em>’s lineage by specifying a path through the
+XML</p>
+<div class="sourceCode" id="cb38"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb38-1"><a href="#cb38-1" aria-hidden="true" tabindex="-1"></a>tt_lineage <span class="ot">&lt;-</span> tax_rec[<span class="st">&quot;//LineageEx/Taxon/ScientificName&quot;</span>]</span>
+<span id="cb38-2"><a href="#cb38-2" aria-hidden="true" tabindex="-1"></a>tt_lineage[<span class="dv">1</span><span class="sc">:</span><span class="dv">4</span>]</span></code></pre></div>
+<p>As the name suggests, <code>XML::xpathSApply()</code> is a
+counterpart of base R’s <code>sapply</code>, and can be used to apply a
+function to nodes in an XML object. A particularly useful function to
+apply is <code>XML::xmlValue</code>, which returns the content of the
+node:</p>
+<div class="sourceCode" id="cb39"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb39-1"><a href="#cb39-1" aria-hidden="true" tabindex="-1"></a>XML<span class="sc">::</span><span class="fu">xpathSApply</span>(tax_rec, <span class="st">&quot;//LineageEx/Taxon/ScientificName&quot;</span>, XML<span class="sc">::</span>xmlValue)</span></code></pre></div>
+<p>There are a few more complex examples of using <code>XPath</code> <a href="https://github.com/ropensci/rentrez/wiki">on the rentrez
+wiki</a></p>
+<p><a name="web_history"></a></p>
+</div>
+</div>
+<div id="using-ncbis-web-history-features" class="section level2">
+<h2>Using NCBI’s Web History features</h2>
+<p>When you are dealing with very large queries it can be time consuming
+to pass long vectors of unique IDs to and from the NCBI. To avoid this
+problem, the NCBI provides a feature called “web history” which allows
+users to store IDs on the NCBI servers then refer to them in future
+calls.</p>
+<p>###Post a set of IDs to the NCBI for later use:
+<code>entrez_post()</code></p>
+<p>If you have a list of many NCBI IDs that you want to use later on,
+you can post them to the NCBI’s severs. In order to provide a brief
+example, I’m going to post just one ID, the <code>omim</code> identifier
+for asthma:</p>
+<div class="sourceCode" id="cb40"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb40-1"><a href="#cb40-1" aria-hidden="true" tabindex="-1"></a>upload <span class="ot">&lt;-</span> <span class="fu">entrez_post</span>(<span class="at">db=</span><span class="st">&quot;omim&quot;</span>, <span class="at">id=</span><span class="dv">600807</span>)</span>
+<span id="cb40-2"><a href="#cb40-2" aria-hidden="true" tabindex="-1"></a>upload</span></code></pre></div>
+<p>The NCBI sends you back some information you can use to refer to the
+posted IDs. In <code>rentrez</code>, that information is represented as
+a <code>web_history</code> object.</p>
+<p>Note that if you have a very long list of IDs you may receive a 414
+error when you try to upload them. If you have such a list (and they
+come from an external sources rather than a search that can be save to a
+<code>web_history</code> object), you may have to ‘chunk’ the IDs into
+smaller sets that can processed.</p>
+<div id="get-a-web_history-object-from-entrez_search-or-entrez_link" class="section level3">
+<h3>Get a <code>web_history</code> object from
+<code>entrez_search</code> or <code>entrez_link()</code></h3>
+<p>In addition to directly uploading IDs to the NCBI, you can use the
+web history features with <code>entrez_search</code> and
+<code>entrez_link</code>. For instance, imagine you wanted to find all
+of the sequences of the widely-studied gene COI from all snails (which
+are members of the taxonomic group Gastropoda):</p>
+<div class="sourceCode" id="cb41"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb41-1"><a href="#cb41-1" aria-hidden="true" tabindex="-1"></a><span class="fu">entrez_search</span>(<span class="at">db=</span><span class="st">&quot;nuccore&quot;</span>, <span class="at">term=</span><span class="st">&quot;COI[Gene] AND Gastropoda[ORGN]&quot;</span>)</span></code></pre></div>
+<p>That’s a lot of sequences! If you really wanted to download all of
+these it would be a good idea to save all those IDs to the server by
+setting <code>use_history</code> to <code>TRUE</code> (note you now get
+a <code>web_history</code> object along with your normal search
+result):</p>
+<div class="sourceCode" id="cb42"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb42-1"><a href="#cb42-1" aria-hidden="true" tabindex="-1"></a>snail_coi <span class="ot">&lt;-</span> <span class="fu">entrez_search</span>(<span class="at">db=</span><span class="st">&quot;nuccore&quot;</span>, <span class="at">term=</span><span class="st">&quot;COI[Gene] AND Gastropoda[ORGN]&quot;</span>, <span class="at">use_history=</span><span class="cn">TRUE</span>)</span>
+<span id="cb42-2"><a href="#cb42-2" aria-hidden="true" tabindex="-1"></a>snail_coi</span>
+<span id="cb42-3"><a href="#cb42-3" aria-hidden="true" tabindex="-1"></a>snail_coi<span class="sc">$</span>web_history</span></code></pre></div>
+<p>Similarity, <code>entrez_link()</code> can return
+<code>web_history</code> objects by using the <code>cmd</code>
+<code>neighbor_history</code>. Let’s find genetic variants (from the
+clinvar database) associated with asthma (using the same OMIM ID we
+identified earlier):</p>
+<div class="sourceCode" id="cb43"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb43-1"><a href="#cb43-1" aria-hidden="true" tabindex="-1"></a>asthma_clinvar <span class="ot">&lt;-</span> <span class="fu">entrez_link</span>(<span class="at">dbfrom=</span><span class="st">&quot;omim&quot;</span>, <span class="at">db=</span><span class="st">&quot;clinvar&quot;</span>, <span class="at">cmd=</span><span class="st">&quot;neighbor_history&quot;</span>, <span class="at">id=</span><span class="dv">600807</span>)</span>
+<span id="cb43-2"><a href="#cb43-2" aria-hidden="true" tabindex="-1"></a>asthma_clinvar<span class="sc">$</span>web_histories</span></code></pre></div>
+<p>As you can see, instead of returning lists of IDs for each linked
+database (as it would be default), <code>entrez_link()</code> now
+returns a list of web_histories.</p>
+</div>
+<div id="use-a-web_history-object" class="section level3">
+<h3>Use a <code>web_history</code> object</h3>
+<p>Once you have those IDs stored on the NCBI’s servers, you are going
+to want to do something with them. The functions
+<code>entrez_fetch()</code> <code>entrez_summary()</code> and
+<code>entrez_link()</code> can all use <code>web_history</code> objects
+in exactly the same way they use IDs.</p>
+<p>So, we could repeat the last example (finding variants linked to
+asthma), but this time using the ID we uploaded earlier</p>
+<div class="sourceCode" id="cb44"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb44-1"><a href="#cb44-1" aria-hidden="true" tabindex="-1"></a>asthma_variants <span class="ot">&lt;-</span> <span class="fu">entrez_link</span>(<span class="at">dbfrom=</span><span class="st">&quot;omim&quot;</span>, <span class="at">db=</span><span class="st">&quot;clinvar&quot;</span>, <span class="at">cmd=</span><span class="st">&quot;neighbor_history&quot;</span>, <span class="at">web_history=</span>upload)</span>
+<span id="cb44-2"><a href="#cb44-2" aria-hidden="true" tabindex="-1"></a>asthma_variants</span></code></pre></div>
+<p>… if we want to get some genetic information about these variants we
+need to map our clinvar IDs to SNP IDs:</p>
+<div class="sourceCode" id="cb45"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb45-1"><a href="#cb45-1" aria-hidden="true" tabindex="-1"></a>snp_links <span class="ot">&lt;-</span> <span class="fu">entrez_link</span>(<span class="at">dbfrom=</span><span class="st">&quot;clinvar&quot;</span>, <span class="at">db=</span><span class="st">&quot;snp&quot;</span>, </span>
+<span id="cb45-2"><a href="#cb45-2" aria-hidden="true" tabindex="-1"></a>                         <span class="at">web_history=</span>asthma_variants<span class="sc">$</span>web_histories<span class="sc">$</span>omim_clinvar,</span>
+<span id="cb45-3"><a href="#cb45-3" aria-hidden="true" tabindex="-1"></a>                         <span class="at">cmd=</span><span class="st">&quot;neighbor_history&quot;</span>)</span>
+<span id="cb45-4"><a href="#cb45-4" aria-hidden="true" tabindex="-1"></a>snp_summ <span class="ot">&lt;-</span> <span class="fu">entrez_summary</span>(<span class="at">db=</span><span class="st">&quot;snp&quot;</span>, <span class="at">web_history=</span>snp_links<span class="sc">$</span>web_histories<span class="sc">$</span>clinvar_snp)</span>
+<span id="cb45-5"><a href="#cb45-5" aria-hidden="true" tabindex="-1"></a>knitr<span class="sc">::</span><span class="fu">kable</span>(<span class="fu">extract_from_esummary</span>(snp_summ, <span class="fu">c</span>(<span class="st">&quot;chr&quot;</span>, <span class="st">&quot;fxn_class&quot;</span>, <span class="st">&quot;global_maf&quot;</span>)))</span></code></pre></div>
+<p>If you really wanted to you could also use <code>web_history</code>
+objects to download all those thousands of COI sequences. When
+downloading large sets of data, it is a good idea to take advantage of
+the arguments <code>retmax</code> and <code>restart</code> to split the
+request up into smaller chunks. For instance, we could get the first 200
+sequences in 50-sequence chunks:</p>
+<p>(note: this code block is not executed as part of the vignette to
+save time and bandwidth):</p>
+<div class="sourceCode" id="cb46"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb46-1"><a href="#cb46-1" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span>( seq_start <span class="cf">in</span> <span class="fu">seq</span>(<span class="dv">1</span>,<span class="dv">200</span>,<span class="dv">50</span>)){</span>
+<span id="cb46-2"><a href="#cb46-2" aria-hidden="true" tabindex="-1"></a>    recs <span class="ot">&lt;-</span> <span class="fu">entrez_fetch</span>(<span class="at">db=</span><span class="st">&quot;nuccore&quot;</span>, <span class="at">web_history=</span>snail_coi<span class="sc">$</span>web_history,</span>
+<span id="cb46-3"><a href="#cb46-3" aria-hidden="true" tabindex="-1"></a>                         <span class="at">rettype=</span><span class="st">&quot;fasta&quot;</span>, <span class="at">retmax=</span><span class="dv">50</span>, <span class="at">retstart=</span>seq_start)</span>
+<span id="cb46-4"><a href="#cb46-4" aria-hidden="true" tabindex="-1"></a>    <span class="fu">cat</span>(recs, <span class="at">file=</span><span class="st">&quot;snail_coi.fasta&quot;</span>, <span class="at">append=</span><span class="cn">TRUE</span>)</span>
+<span id="cb46-5"><a href="#cb46-5" aria-hidden="true" tabindex="-1"></a>    <span class="fu">cat</span>(seq_start<span class="sc">+</span><span class="dv">49</span>, <span class="st">&quot;sequences downloaded</span><span class="sc">\r</span><span class="st">&quot;</span>)</span>
+<span id="cb46-6"><a href="#cb46-6" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
+</div>
+</div>
+<div id="rate-limiting-and-api-keys" class="section level2">
+<h2>Rate-limiting and API Keys</h2>
+<p>By default, the NCBI limits users to making only 3 requests per
+second (and <code>rentrez</code> enforces that limit). Users who
+register for an “API key” are able to make up to ten requests per
+second. Getting one of these keys is simple, you just need to <a href="https://www.ncbi.nlm.nih.gov/account/">register for “my ncbi”
+account</a> then click on a button in the <a href="https://www.ncbi.nlm.nih.gov/account/settings/">account settings
+page</a>.</p>
+<p>Once you have an API key, rentrez will allow you to take advantage of
+it. For one-off cases, this is as simple as adding the
+<code>api_key</code> argument to given function call. (Note these
+examples are not executed as part of this document, as the API key used
+here not a real one).</p>
+<div class="sourceCode" id="cb47"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb47-1"><a href="#cb47-1" aria-hidden="true" tabindex="-1"></a><span class="fu">entrez_link</span>(<span class="at">db=</span><span class="st">&quot;protein&quot;</span>, <span class="at">dbfrom=</span><span class="st">&quot;gene&quot;</span>, <span class="at">id=</span><span class="dv">93100</span>, <span class="at">api_key =</span><span class="st">&quot;ABCD123&quot;</span>)</span></code></pre></div>
+<p>It most cases you will want to use your API for each of several calls
+to the NCBI. <code>rentrez</code> makes this easy by allowing you to set
+an environment variable ,<code>ENTREZ_KEY</code>. Once this value is set
+to your key <code>rentrez</code> will use it for all requests to the
+NCBI. To set the value for a single R session you can use the function
+<code>set_entrez_key()</code>. Here we set the value and confirm it is
+available.</p>
+<div class="sourceCode" id="cb48"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb48-1"><a href="#cb48-1" aria-hidden="true" tabindex="-1"></a><span class="fu">set_entrez_key</span>(<span class="st">&quot;ABCD123&quot;</span>)</span>
+<span id="cb48-2"><a href="#cb48-2" aria-hidden="true" tabindex="-1"></a><span class="fu">Sys.getenv</span>(<span class="st">&quot;ENTREZ_KEY&quot;</span>)</span></code></pre></div>
+<p>If you use <code>rentrez</code> often you should edit your
+<code>.Renviron</code> file (see <code>r help(Startup)</code> for
+description of this file) to include your key. Doing so will mean all
+requests you send will take advantage of your API key.</p>
+<div class="sourceCode" id="cb49"><pre class="sourceCode ini"><code class="sourceCode ini"><span id="cb49-1"><a href="#cb49-1" aria-hidden="true" tabindex="-1"></a><span class="dt">ENTREZ_KEY</span><span class="ot">=</span><span class="st">ABCD123</span></span></code></pre></div>
+<p>As long as an API key is set by one of these methods,
+<code>rentrez</code> will allow you to make up to ten requests per
+second.</p>
+<div id="slowing-rentrez-down-when-you-hit-the-rate-limit" class="section level3">
+<h3>Slowing rentrez down when you hit the rate-limit</h3>
+<p><code>rentrez</code> won’t let you <em>send</em> requests to the NCBI
+at a rate higher than the rate-limit, but it is sometimes possible that
+they will <em>arrive</em> too close together an produce errors. If you
+are using <code>rentrez</code> functions in a for loop and find
+rate-limiting errors are occuring, you may consider adding a call to
+<code>Sys.sleep(0.1)</code> before each message sent to the NCBI. This
+will ensure you stay beloe the rate limit.</p>
+</div>
+</div>
+<div id="what-next" class="section level2">
+<h2>What next ?</h2>
+<p>This tutorial has introduced you to the core functions of
+<code>rentrez</code>, there are almost limitless ways that you could put
+them together. <a href="https://github.com/ropensci/rentrez/wiki">Check
+out the wiki</a> for more specific examples, and be sure to read the
+inline-documentation for each function. If you run into problem with
+rentrez, or just need help with the package and <code>Eutils</code>
+please contact us by opening an issue at the <a href="https://github.com/ropensci/rentrez/issues">github
+repository</a></p>
+</div>
+
+
+
+<!-- code folding -->
+
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+
+</body>
+</html>
diff --git a/man/entrez_citmatch.Rd b/man/entrez_citmatch.Rd
old mode 100755
new mode 100644
diff --git a/man/entrez_db_links.Rd b/man/entrez_db_links.Rd
old mode 100755
new mode 100644
diff --git a/man/entrez_db_searchable.Rd b/man/entrez_db_searchable.Rd
old mode 100755
new mode 100644
diff --git a/man/entrez_db_summary.Rd b/man/entrez_db_summary.Rd
old mode 100755
new mode 100644
diff --git a/man/entrez_dbs.Rd b/man/entrez_dbs.Rd
old mode 100755
new mode 100644
diff --git a/man/entrez_global_query.Rd b/man/entrez_global_query.Rd
old mode 100755
new mode 100644
diff --git a/man/entrez_info.Rd b/man/entrez_info.Rd
old mode 100755
new mode 100644
diff --git a/man/entrez_post.Rd b/man/entrez_post.Rd
old mode 100755
new mode 100644
diff --git a/man/entrez_summary.Rd b/man/entrez_summary.Rd
old mode 100755
new mode 100644
diff --git a/man/linkout_urls.Rd b/man/linkout_urls.Rd
old mode 100755
new mode 100644
diff --git a/man/parse_pubmed_xml.Rd b/man/parse_pubmed_xml.Rd
old mode 100755
new mode 100644
diff --git a/man/rentrez.Rd b/man/rentrez.Rd
old mode 100755
new mode 100644
diff --git a/tests/test-all.R b/tests/test-all.R
old mode 100755
new mode 100644
diff --git a/tests/testthat/test_citmatch.r b/tests/testthat/test_citmatch.r
old mode 100755
new mode 100644
diff --git a/tests/testthat/test_docs.r b/tests/testthat/test_docs.r
old mode 100755
new mode 100644
diff --git a/tests/testthat/test_httr.r b/tests/testthat/test_httr.r
old mode 100755
new mode 100644
diff --git a/tests/testthat/test_httr_post.r b/tests/testthat/test_httr_post.r
old mode 100755
new mode 100644
diff --git a/tests/testthat/test_info.r b/tests/testthat/test_info.r
old mode 100755
new mode 100644
diff --git a/tests/testthat/test_link.r b/tests/testthat/test_link.r
old mode 100755
new mode 100644
diff --git a/tests/testthat/test_post.r b/tests/testthat/test_post.r
old mode 100755
new mode 100644
diff --git a/tests/testthat/test_query.r b/tests/testthat/test_query.r
old mode 100755
new mode 100644
diff --git a/tests/testthat/test_search.r b/tests/testthat/test_search.r
old mode 100755
new mode 100644
diff --git a/tests/testthat/test_webenv.r b/tests/testthat/test_webenv.r
old mode 100755
new mode 100644

Debdiff

[The following lists of changes regard files as different if they have different names, permissions or owners.]

Files in second set of .debs but not in first

-rw-r--r--  root/root   /usr/lib/R/site-library/rentrez/doc/rentrez_tutorial.html

Control files: lines which differ (wdiff format)

  • Depends: r-base-core (>= 4.2.2.20221110-2), 4.2.2.20221110-1), r-api-4.0, r-cran-xml, r-cran-httr (>= 0.5), r-cran-jsonlite (>= 0.9)

More details

Full run details