Submitting a Datapage to the DANTE Project

Joshua BrinksISciences, LLC 

Keywords:

contribute

Contents:

Introduction

The DANTE Project provides an open source community platform to lower the barriers of entry to climate security research and policy making. One of the core components of the project is the dataset library. Although DANTE does not host or distribute datasets, we provide a catalogue of data widely used in human geography, political science, and global climate research. Moreover, in contrast to the Socioeconomic Data and Applications Center SEDAC, Humanitarian Data Exchange, and similar data warehouse hosting sites, we present a variety of complimentary tools and commentary tailored to our available datasets. In addition to standard information describing dataset authors, hosting information, and spatial and temporal extents, we present a variety of supplementary tools and information:

  • Discussion points on its use in the research and practitioner communities.
  • Critical commentary of where the dataset excels.
  • Conversely, the data’s potential biases, drawbacks, or other methodological flaws.
  • R packages or other software developed specifically to compliment the dataset.
  • Vignettes and tutorials utilizing the data.
  • Direct commentary from DANTE users.

Completing the Template

Accessing the RMarkdown Template

The danteSubmit package can be installed vit GitLab with the devtools package using the following command:

devtools::install_gitlab("/dante-sttr/danteSubmit")

Following installation, the dataset template will be accessible in RStudio through the File > New File > R Markdown ... > From Template > Dante Dataset Submission menu. This will create a new directory in your home directory with the dataset skeleton.Rmd file. This provides the framework of the submission with several DANTE specific metadata fields.

The rmarkdown template interface.

The rmarkdown template interface.

Template Fields

RMarkdown skeleton.Rmd files are typically comprised of two sections: 1) the YAML metadata, and 2) the text body. The skeleton.Rmd begins with the YAML header. It is demarcated by two sets of ---. The text body uses traditional sectioning with the RMarkdown language. More information pertaining to RMarkdown formatting can be found on their official site. DANTE dataset submissions require no prior knowledge of RMarkdown syntax and users may delete any YAML fields not relevant to the dataset (additional authors, strengths, weaknesses, spatial information with non-spatial data, etc.). Nearly every metadata field for DANTE dataset submissions are adapted from the Federal Geographic Data Committee’s (FGDC’s) Content Standards for Digital Geospatial Metada. These standards are widely used and employ thoroughly vetted nomenclature and definitions.

YAML Metadata

  • metadata-contact: Name, email, and affiliation (if applicable), of the individual or institution completing the DANTE dataset submission. Multiple authors are separated by -.

  • metadata-date: Date the dataset submission was prepared. Leave as is. It will generate the date automatically when compiled.

  • citation-information: Citation information of the dataset. When possible, populate these fields with the official citation metadata. When no accompanying manuscript or officially decreed citation exists, populate the fields with best available information.

    • title: The official title of the dataset or accompanying manuscript.
    • edition: Current version or edition of the dataset.
    • publication-date: Date of the most recent release of the dataset.
    • geospatial-data-presentation-form: The form or datatype of the dataset. A minimum of one word describing the data format, e.g. raster, tabular, spatial points, shapefile, country-year, dyadic, etc.
    • publisher: Name of institution responsible for publication of the dataset.
    • online-linkage: URL for the location of the current version of the dataset.
    • dante-citekey: If the dataset already exists in the DANTE Citation Repository, list citation key with the form AuthorYEAR.
  • contact-information: Contact information for the dataset authors. Each author is separated by a -.

  • contact-person: Name(s), email(s), and affiliation(s) (if applicable), of the individual(s) or institution(s) who authored the dataset. When possible these should match the information for any peer reviewed manuscript that accompanied the release of the dataset. Multiple authors are separated by -.

  • dataset-strengths: 1-3 bullet points hilighting positive aspects of the dataset.

  • dataset-weaknesses: 1-3 bullet points hilighting areas where the dataset is limited.

  • abstract: If it exists, the official abstract for the dataset. This may be copied verbatim as long as either: 1) the DANTE submission contains the direct url link to the dataset hosting site, or 2) the DANTE submission contains the official citation of the dataset. In the event that no abstract is present, the discussion section of the submission template should contain adequate descriptive information. The user may also construct an abstract if one does not exist.

  • use-constraints: Dataset license specification or written text describing dataset use restrictions.

  • spatial-information:

    • bounding-coordinates: Geographic scope of the dataset relayed as a four point bounding box. When using R, these coordinates can be extracted using raster::extent().
    • spatial-reference-information:
      • coordinate-system: Dataset coordinate system (UTM, Latitude-Longitude, etc.)
      • resolution: Dataset spatial resolution.
      • units: Dataset resolution units (meters, decimal degrees, etc.).
      • geodetic-model: Geodetic model used for projection (commonly WGS1984).
  • time-period-information:

    • beginning-date: First date of observations.
    • ending-date: Final date of observations.
    • resolution: Integration period or temporal resolution of dataset (annual, monthly, weekly, daily, etc.).
  • related-packages: R packages designed to acquire, process, analyze, or visualize the dataset.

  • related-vignettes: Vignettes or other tutorials featuring the dataset.

  • bibliography: File name for the bibliography used to properly cite the “Discussion” section.

  • browse-image: File name for the image to be used while browsing on the DANTE website. This may be left blank for project administrators to handle. If you would like to provide an image please crop it to 300 x 225 pixels.

  • output: This identifies the rmarkdown template to compile the submission. Should not be altered by the user.

Body

  • Discussion: The discussion section should consist of 1-2 paragraphs providing properly cited commentary of the submitted dataset. This includes but is not limited to:

    • A more developed passage of positive and negative aspects beyond bullet points listed in the YAML header.
    • Brief references to prominent peer reviewed or commissioned reports featuring the dataset.
    • Brief passage describing functionality of related R packages listed in the YAML header. Do they provide API interfaces, data processing, analysis, or visualization functionality? Do they work with the current release of the data or are they deprecated?
    • Brief passage relating the nature of the vignettes listed in the YAML header.
  • Screenshot or representative figure: A screenshot, map, or other figure illustrating the dataset. This may be a screen captured image uploaded to the working directory, or a map/figure compiled into the document via a code chunk.

  • Reference: This should not be altered by the user. It will generate the full citation for any references listed in the “Discussion” section.

Submitting the Template

After all relevant fields are complete, the user must compile the HTML submission by knitting the document inside of RStudio. At this point, the submission is ready and the dataset-name.html output can be pushed to the DANTE GitLab dataset repository or uploaded to the DANTE website.

Additional Meta-Data

Add new comment

Plain text

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd>
  • No HTML tags allowed.
  • Web page addresses and email addresses turn into links automatically.