--- title: Introduction to ridigbio output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to ridigbio} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- The ridigbio package can be used to obtain records from [iDigBio](https://www.idigbio.org/) API's, including both the [Search API](https://github.com/idigbio/idigbio-search-api/wiki) and the [Media APIs](https://www.idigbio.org/wiki/index.php/IDigBio_API#Record_.26_Media_APIs). ## General Overview In this demo we will cover how to: 1. Install `ridigbio` 2. Search for records with `idig_search_records()` 3. Search for media records with `idig_search_media()` ## Getting Started First, you must install the ridigbio package. If you are new to R and R studio, please refer to our QUBES module to get started: Introduction to R with Biodiversity Data, [doi:10.25334/84FC-TE88](https://www.doi.org/10.25334/84FC-TE88) . The lastest version of our R package can be installed via CRAN. ```{r eval=FALSE, include=TRUE} install.packages("ridigbio") ``` Before downloading any records, you must load the ridigbio package. ```{r message=FALSE, warning=FALSE} library(ridigbio) ``` ```{r echo = FALSE} verify_galax_records <- FALSE #Test that examples will run tryCatch({ # Your code that might throw an error verify_galax_records <- idig_search_records(rq=list(scientificname="Galax urceolata"), limit = 10 ) }, error = function(e) { # Code to run if an error occurs cat("An error occurred during the idig_search_records call: ", e$message, "\n") cat("Vignettes will not be fully generated. Please try again after resolving the issue.") # Optionally, you can return NULL or an empty dataframe verify_galax_records <- FALSE }) ``` ## Download Records To download records from the Search API, we will use the function `idig_search_records()`. Here the `rq`, or record query, indicates we want to download all the records where the `scientificname` is equal to [Galax urceolata](https://en.wikipedia.org/wiki/Galax). ```{r eval=verify_galax_records} galax_records <- idig_search_records(rq=list(scientificname="Galax urceolata")) ``` ```{r eval=verify_galax_records} colnames(galax_records) ``` When fields are not specified, default columns include the following: | Column | Description | |----------------|--------------------------------------------------------| | uuid | Universally Unique IDentifier assigned by iDigBio | | occurrenceid | identifier for the occurrence, | | catalognumber | identifier for the record within the collection, | | family | scientific name of the family, | | genus | scientific name of the genus, | | scientificname | scientific name, | | country | country, | | stateprovince | name of the next smaller administrative region than country, | | geopoint.lon | equivalent to decimalLongitude, | | geopoint.lat | equivalent to decimalLatitude, | | datecollected | [Modified field and could lack biological meaning](https://github.com/iDigBio/idb-backend/issues/229) | | data.dwc:eventDate | equivalent to eventDate, | | data.dwc:year | year of collection event, | | data.dwc:month | month of collection event, | | data.dwc:day | day of collection event | | collector | equivalent to recordedBy, | | recordset | indicates the iDigBio recordset the observation belongs too! | ### More ways to search In addition to `scientificname`, record query may be based on many other fields. For example, you can search for all members of the `family` [Diapensiaceae](https://en.wikipedia.org/wiki/Diapensiaceae): ```{r eval=verify_galax_records} diapensiaceae_records <- idig_search_records(rq=list(family="Diapensiaceae"), limit=1000) ``` **What if you want to read in all the points for a family within an extent?** **Hint**: Use the [iDigBio portal](https://www.idigbio.org/portal/search) to determine the bounding box for your region of interest. The bounding box delimits the geographic extent. ```{r eval=verify_galax_records} rq_input <- list("scientificname"=list("type"="exists"), "family"="Diapensiaceae", geopoint=list( type="geo_bounding_box", top_left=list(lon = -98.16, lat = 48.92), bottom_right=list(lon = -64.02, lat = 23.06) ) ) ``` Search using the input you just made ```{r eval=verify_galax_records} diapensiaceae_records_USA <- idig_search_records(rq_input, limit=1000) ``` ## Download Media Records To download media records from the Media API, we will use the function `idig_search_media()`. Here the `rq`, or record query, indicates we want to download all the records where the `scientificname` is equal to [Galax urceolata](https://en.wikipedia.org/wiki/Galax). ```{r eval=verify_galax_records} galax_media <- idig_search_media(rq=list(scientificname="Galax urceolata")) ``` ```{r eval=verify_galax_records} colnames(galax_media) ``` When fields are not specified, default columns include the following: | Column | Description | |---------------|---------------------------------------------------------| | accessuri | Unique identifier for a resource, | | datemodified | date last modified, which is assigned by iDigBio | | dqs | data quality score assigned by iDigBio | | etag | tag assigned by iDigBio | | flags | data quality flag assigned by iDigBio | | format | media format, | | hasSpecimen | TRUE or FALSE, indicates if there is an associated record for this media | | licenselogourl | media license, ) | | mediatype | media object type | | modified | date modified, | | recordids | list of UUID for associated records | | records | UUID for the associated record. Use this field to connect Record downloads with Media downloads | | recordset | indicates the iDigBio recordset the observation belongs too! | | rights | media rights, | | tag | general keywords or tags, | | type | media type, | | uuid | Universally Unique IDentifier assigned by iDigBio | | version | media record version assigned by iDigBio | | webstatement | media rights, | | xpixels | as defined by EXIF, x dimension in pixel | | ypixels | as defined by EXIF,y dimension in pixels | ### More ways to search The media search above retained `r tryCatch({if(nrow(galax_media)) nrow(galax_media) else "N/A"}, error = function(e){cat("error in vignette: ", e$message)})` rows, however some of these observations do not have information in the `accessuri` field. To only obtain records with `acessuri`, we indicate we only want records where `data.ac:accessURI` exist, by setting `mq`, or media query, as followed: ```{r eval=verify_galax_records} galax_media2 <- idig_search_media(rq=list(scientificname="Galax urceolata"), mq=list("data.ac:accessURI"=list("type"="exists"))) ``` Now we have `r tryCatch({if(nrow(galax_media2)) nrow(galax_media2) else "N/A"}, error = function(e){cat("error in vignette: ", e$message)})` observations with `accessuri`!