--- title: "genesysr Tutorial" author: "Matija Obreza & Nora Castaneda" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{genesysr Tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Querying Genesys PGR ===================== [Genesys PGR](https://www.genesys-pgr.org) is the global database on plant genetic resources maintained *ex situ* in national, regional and international genebanks around the world. **genesysr** uses the [Genesys API](https://www.genesys-pgr.org/documentation/apis) to query Genesys data. The API is accessible at https://api.genesys-pgr.org. Accessing data with **genesysr** is similar to downloading data in CSV or Excel format and loading it into R. ## For the impatient Accession passport data is retrieved with the `get_accessions` function. The database is queried by providing a `filter` (see Filters below): ``` ## Setup: use Genesys Sandbox environment # genesysr::setup_sandbox() # Use this to connect to our test environment https://sandbox.genesys-pgr.org # genesysr::setup_production() # This is initialized by default when loading genesysr # Open a browser: login to Genesys and authorize access genesysr::user_login() # Retrieve first 1000 accessions for genus *Musa* musa <- get_accessions(filters = list(taxonomy = list(genus = c('Musa'))), at.least = 1000) # Or retrieve all accession data for genus *Musa* musa <- get_accessions(filters = list(taxonomy = list(genus = c('Musa')))) # Retrieve all accession data for the Musa International Transit Center, Bioversity International itc <- get_accessions(list(institute = list(code = c('BEL084')))) # Retrieve all accession data for the Musa International Transit Center, Bioversity International (BEL084) and the International Center for Tropical Agriculture (COL003) some <- get_accessions(list(institute = list(code = c('BEL084','COL003')))) ``` **genesysr** provides utility functions to create `filter` objects using [Multi-Crop Passport Descriptors (MCPD)](https://www.genesys-pgr.org/documentation/basics) definitions: ``` # Retrieve data by country of origin (MCPD) get_accessions(mcpd_filter(ORIGCTY = c("DEU", "SVN"))) ``` # Processing fetched data The data is provided by Genesys as CSV. Where multiple values are possible for a column, there will be multiple columns. For example, accession `STORAGE` may be provided as: |...|storage1|storage2|storage3| |--|--|--|--| |...|10|20|30| |...|30|40|*NA*| |...|30|*NA*|*NA*| |...|10|20|30| # Filters The `filter` object is a named `list()` where names match a Genesys filter and the value specifies the criteria to match. The records returned by Genesys match all filters provided (*AND* operation), while individual filters allow for specifying multiple criteria (*OR* operation): ```r # (GENUS == Musa) AND ((ORIGCTY == NGA) OR (ORIGCTY == CIV)) filter <- list(taxonomy = list(genus = c('Musa'), species = c('aa')), countryOfOrigin = list(iso3 = c('NGA', 'CIV'))) # OR filter <- list(); filter$taxonomy$genus = c('Musa') filter$taxonomy$species = c('aa') filter$countryOfOrigin$iso3 = c('NGA', 'CIV') # See filter object as JSON jsonlite::toJSON(filters) ``` There are a number of filtering options to retrieve data from Genesys. Best explore how filtering works on the actual website https://www.genesys-pgr.org/a/overview by inspecting the HTTP requests sent by your browser to the API server and then replicating them here. ### Taxonomy `taxonomy$genus` filters by a *list* of genera. ```r filters <- list(taxonomy = list(genus = c('Hordeum', 'Musa'))) # Print jsonlite::toJSON(filters) ``` `taxonomy$species` filters by a *list* of species. ```r filters <- list(taxonomy = list(genus = c('Hordeum'), species = c('vulgare'))) # Print jsonlite::toJSON(filters) ``` ### Origin of material `countryOfOrigin$iso3` filters by ISO3 code of country of origin of PGR material. ```r # Material originating from Germany (DEU) and France (FRA) filters <- list(countryOfOrigin = list(iso3 = c('DEU', 'FRA'))) ``` `geo.latitude` and `geo.longitude` filters by latitude/longitude (in decimal format) of the collecting site. ```r # TBD filters <- list(geo = list(latitude = genesysr::range(-10, 30), longitude = genesysr::range(30, 50))) ``` ### Holding institute `institute$code` filters by a *list* of FAO WIEWS institute codes of the holding institutes. ```r # Filter for ITC (BEL084) and CIAT (COL003) list(institute = list(code = c('BEL084', 'COL003'))) ``` `institute$country$iso3` filters by a *list* of ISO3 country codes of country of the holding institute. ```r # Filter for genebanks in Slovenia (SVN) and Belgium (BEL) list(institute = list(country = list(iso3 = c('SVN', 'BEL')))) ``` # Selecting columns Genesys API returns a lot of variables for accession passport data. To reduce the amount of data to be processed and kept in memory, select the columns of interest the `fields` vector: ``` # Fetch only accession id, storage and taxonomic data for *Musa* musa <- genesysr::get_accessions(list(taxonomy = list(genus = c('Musa'))), fields = c("taxonomy", "storage", "id")) ``` To list the variable names returned by the Genesys APIs, test the response and select columns of interest: ```r # fetch_accessions uses the JSON format accn <- fetch_accessions(filters = list(), at.least = 100) # Print names used in JSON response from Genesys sort(unique(names(unlist(accn$content)))) ``` # Step-by-step example Let's take a look of all the process of fetching accession passport data from Genesys. 1. Load genesysr ```r library(genesysr) ``` 2. Setup using user credentials ```r setup_sandbox() user_login() ``` 3. Fetch data ```r musa <- genesysr::get_accessions(list(taxonomy = list(genus = c('Musa'))), at.least = 1000) ``` 4. Identify columns of interest ```r names(musa) ```