Skip to content

occCite: Querying and Managing Large Biodiversity Occurrence Datasets #407

@hannahlowens

Description

@hannahlowens

Date accepted: 2022-03-11
Submitting Author Name: Hannah L. Owens
Submitting Author Github Handle: @hannahlowens
Repository: https://github.com/hannahlowens/occCite
Version submitted: 0.4.0
Editor: @karthik
Reviewers: @peterdesmet

Due date for @peterdesmet: 2020-12-21

Reviewer 2: @damianooldoni
Archive: TBD
Version accepted: TBD


  • Paste the full DESCRIPTION file inside a code block below:
Package: occCite
Type: Package
Title: Querying and Managing Large Biodiversity Occurrence Datasets
Version: 0.4.0
Authors@R: c(
  person(given = "Hannah L.", family = "Owens", ,"[email protected]", role=c("aut","cre"),
    comment = c(ORCID = "0000-0003-0071-1745")),
  person(given = "Cory", family = "Merow",,,role="aut",
    comment = c(ORCID = "0000-0003-0561-053X")),
  person(given = "Brian", family = "Maitner", ,,,role="aut",
    comment = c(ORCID = "0000-0002-2118-9880")),
  person(given = "Jamie M.", family = "Kass", ,,,role="aut",
    comment = c(ORCID = "0000-0002-9432-895X")),
  person(given = "Vijay", family = "Barve",,"[email protected] ",role="aut",
    comment = c(ORCID = "0000-0002-4852-2567")),
  person(given = "Robert P.", family = "Guralnick",,,role="aut",
    comment = c(ORCID = "0000-0001-6682-1504"))
  )
Author: Hannah L. Owens [aut, cre] (<https://orcid.org/0000-0003-0071-1745>), Cory Merow [aut] (<https://orcid.org/0000-0003-0561-053X>), Brian Maitner [aut] (<https://orcid.org/0000-0002-2118-9880>), Jamie M. Kass [aut] (<https://orcid.org/0000-0002-9432-895X>), Vijay Barve [aut] (<https://orcid.org/0000-0002-4852-2567>), Robert P. Guralnick [aut] (<https://orcid.org/0000-0001-6682-1504>)
Maintainer: Hannah L. Owens <[email protected]>
Description: Facilitates the gathering of biodiversity occurrence data 
  from disparate sources. Metadata is managed throughout the process to facilitate 
  reporting and enhanced ability to repeat analyses.
License: GPL (>= 2)
URL: https://github.com/hannahlowens/occCite
BugReports: https://github.com/hannahlowens/occCite/issues
Encoding: UTF-8
LazyData: true
Language: en-US
Depends: R (>= 3.5.0)
Suggests:
  rmarkdown,
  RColorBrewer,
  viridis,
  remotes,
  RefManageR
Imports:
    bib2df,
    BIEN,
    bit64,
    dplyr,
    ape,
    lubridate,
    methods,
    rgbif (>= 3.1),
    taxize,
    stringr,
    knitr,
    stats,
    leaflet,
    htmltools,
    ggplot2,
    rlang,
    magrittr,
    tidyr,
    RPostgreSQL,
    DBI,
    waffle
VignetteBuilder: knitr
RoxygenNote: 7.1.1

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):
    occCite retrieves species occurrence data from several aggregator databases, processes them into a single data object, creates primary data provider citations based on the results.

  • Who is the target audience and what are scientific applications of this package?
    The target audience is primarily biogeographers and other researchers interested in the geographic distributions of biodiversity. occCite aims to close the gap in the citation cycle between primary data providers and final research products efficiently, allowing researchers to meet best-practice standards for biodiversity dataset documentation without sacrificing time and resources to the demands of providing increasing levels of detail on their datasets.

  • Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

Tools in R do exist that enable researchers to document sources during the data collection
process—rgbif (Chamberlain et al. 2020a) for GBIF and BIEN (Maitner 2020) for the Botanical
Information and Ecology Network (BIEN) are both examples of API-interface packages that
include valuable features for citing data harvested from their aggregator databases. However, these
packages serve a single aggregator database are designed for specific use cases
tailored to their databases, and uniting aggregator results into a single dataset brings its own set of
challenges. Multi-platform occurrence aggregators do exist—searches using spocc (Chamberlain 2019)
can return occurrence information from up to 6 aggregator databases—but the process of combining data from these aggregators in each query results in the loss of key metadata, particularly accession date, primary data source, and, in the case of GBIF, dataset DOIs. Finally, to our knowledge, there does not yet exist a set of R tools that will manage metadata from an occurrence search and translate it into formatted citations for primary data providers that include accession dates and DOIs.

Yes

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you intend for this package to go on Bioconductor?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
JOSS Options
  • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions