`Query.Rmd`

The `queryLipdverse()`

function in `lipdR`

allows users to download LiPD files based on various filter
parameters

The filtering relies on a query table, which holds metadata for all time series available on LiPDverse

This query table is built in to, and loaded with lipdR

`library(lipdR)`

Let’s get a sense for what the table holds

```
dim(queryTable)
#> Error in eval(expr, envir, enclos): object 'queryTable' not found
names(queryTable)
#> Error in eval(expr, envir, enclos): object 'queryTable' not found
```

That’s a lot of rows! There’s a row for every time series in the LiPDverse database

If you have used the LiPD format, you will recognize some of these names

Let’s try a simple query

We’ll set `skip.update = FALSE`

for this demonstration,
but it’s a good idea to check for updates each time you start a new
session

If you tell `queryLipdverse`

to `skip.update`

once, it will not prompt you again in a given R session

```
qt <- queryLipdverse(variable.name = c("δ18O"),
skip.update = TRUE)
#> Based on your query parameters, there are 0 available time series in 0 datasets
```

huh, there must be more Oxygen-18 time series

Let’s have a look at the unique paleoData
`variable.name`

We’ll just look at the first 250, there are quite a few

```
unique(queryTable$paleoData_variableName)[1:250]
#> Error in eval(expr, envir, enclos): object 'queryTable' not found
```

There’s quite a few notation styles, but they all have “18o” in common, so let’s try that

```
qt <- queryLipdverse(variable.name = c("18o"))
#> Based on your query parameters, there are 1688 available time series in 1069 datasets
```

Now that is a lot more data

Some query parameters will require considering what all the possible results have in common

In other cases, we can simply input a vector with several possible options

For the `variable.name`

parameter, multiple entries are
combined with “OR” logic, so more entries will generally pull more
datasets

```
qt <- queryLipdverse(variable.name = c("δ18O", "d18O"))
#> Based on your query parameters, there are 1688 available time series in 1069 datasets
```

This gets us almost the same result, but we see the simplified filter is still pulling one more dataset

Also note that capitalization makes no difference here

The `archive.type`

filter works similarly, let’s look at
our options

```
unique(queryTable$archiveType)
#> Error in eval(expr, envir, enclos): object 'queryTable' not found
```

Note that we have distinct options: “Lake”, “LakeSediment”, “LakeDeposits”, “LakeDeposit”, and “Lake Sediment”

We can grab the `archive.type`

names and search for all of
them like this

```
qt <- queryLipdverse(archive.type = unique(queryTable$archiveType)[c(2,3,5,6,39)])
#> Error in eval(expr, envir, enclos): object 'queryTable' not found
```

Or we can use the simplification strategy again

```
qt <- queryLipdverse(archive.type = c("lake"))
#> Based on your query parameters, there are 46308 available time series in 2309 datasets
```

As we saw last time, the results are similar

We can also filter based on publications: Author, DOI, Title, etc.
using `pub.info`

```
qt <- queryLipdverse(pub.info = c("10.1016/j.quascirev.2008.09.005"))
#> Based on your query parameters, there are 20 available time series in 7 datasets
```

This query can be a little slow if you don’t narrow the results with another parameter first

Let’s narrow our region of interest

There are four different parameters used for this:
`coord`

, `country`

, `continent`

, and
`ocean`

Let’s grab all the North American datasets

```
qt <- queryLipdverse(continent = "North America")
#> Based on your query parameters, there are 46652 available time series in 1817 datasets
```

Let’s grab just those from Mexico

```
qt <- queryLipdverse(country = c("Mexico"))
#> Based on your query parameters, there are 512 available time series in 36 datasets
```

Note that the country and contient filters are not filtered based on LiPD content. A function in R uses the coordinates associated with the datasets to associate them with countries and the results can be unreliable near country borders

Again, we can see all of the options for `country`

and
`continent`

with `unique()`

We can also use latitude and longitude directly with a bounding box

```
qt <- queryLipdverse(coord = c(0,90,-180,-110))
#> Based on your query parameters, there are 14062 available time series in 772 datasets
```

We can limit this to only marine data by setting `ocean`

to `TRUE`

```
qt <- queryLipdverse(coord = c(0,90,-180,-110),
ocean = TRUE)
#> Based on your query parameters, there are 1485 available time series in 85 datasets
```

The `ocean`

parameter works on the same basis as the
`country`

and `continent`

parameters and can be
unreliable in coastal areas

We can also grab all the data from a `compilation`

, such
as the Western North America (WNAm)

```
qt <- queryLipdverse(compilation = c("wNAm"))
#> Based on your query parameters, there are 392 available time series in 184 datasets
```

or multiple compilations

```
qt <- queryLipdverse(compilation = c("wnam", "temp12k"))
#> Based on your query parameters, there are 1717 available time series in 799 datasets
```

the `compilation`

filter uses “OR” logic

Let’s try pulling datasets based on their
`seasonality`

We can pull summer, commonly defined as June, July, and August, for the Northern Hemisphere

`seasonality`

input is taken as a list

```
qt <- queryLipdverse(continent = "North America",
seasonality = list("June", "July", "August"))
#> Based on your query parameters, there are 1 available time series in 1 datasets
```

Okay, so this must not be a good choice of format, let’s see how LiPD authors define their seasons

```
unique(queryTable$interpretation1_seasonality)
#> Error in eval(expr, envir, enclos): object 'queryTable' not found
```

It looks like numeric months are most common. Season names, “warm”/“cold”, and series of first-letter abbreviations (ie. JJA) are all common.

Knowing this, let’s try again

Items within a single list are treated as linked by “AND”, so an input of list(“June”, “July”, “August”) would filter for season data with ALL of these months

Multiple lists are treated as linked by “OR”, such that list(list(“June”), list(“July”), list(“August)) would filter for season data with ANY of these months

Let’s try to get all of the summer datasets by entering a few different notations

```
qt <- queryLipdverse(continent = "North America",
seasonality = list(list("6", "7", "8"), list("summer"), list("JJA")))
#> Based on your query parameters, there are 219 available time series in 165 datasets
```

This returns quite a few datasets

From our look at all the unique `seasonality`

entries, we
can see that this probably includes a lot of annual data too

Let’s exclude the annual and winter datasets by using
`season.not`

The input for for `season.not`

works the same as
`seasonality`

```
qt <- queryLipdverse(continent = "North America",
seasonality = list(list("6", "7", "8"), list("summer"), list("JJA")),
season.not = list(list("annual"), list("December"), list("12","1","2"), list("winter"), list("cold")))
#> Based on your query parameters, there are 193 available time series in 139 datasets
```

Now we’ve narrowed it down to summer-specific datasets

Let’s look at interpretation variables now. These are the climate variables that may serve as a target for the proxy time series available

These variables are expressed in two different formats: interpretation variable and interpretation detail

Each of these variables has four possible interpretation slots

```
unique(queryTable$interp_Vars)
#> Error in eval(expr, envir, enclos): object 'queryTable' not found
unique(queryTable$interp_Details)
#> Error in eval(expr, envir, enclos): object 'queryTable' not found
```

Let’s look at some marine `interp.vars`

in the northeast
Pacific

```
qt <- queryLipdverse(coord = c(0,90,-180,-110),
ocean = TRUE,
interp.vars = c("SST", "upwelling", "SSS"))
#> Based on your query parameters, there are 22 available time series in 7 datasets
```

That gives us just a few datasets

Perhaps we’ll have more luck with `interp.details`

, which
is more standardized

```
qt <- queryLipdverse(coord = c(0,90,-180,-110),
ocean = TRUE,
interp.details = c("sea@surface", "elNino"))
#> Based on your query parameters, there are 59 available time series in 38 datasets
```

let’s see if we grab more using both

These inputs combine with “OR” logic, so we may gather more datasets by using both parameters

```
qt <- queryLipdverse(coord = c(0,90,-180,-110),
ocean = TRUE,
interp.details = c("sea@surface", "elNino"),
interp.vars = c("SST", "upwelling", "SSS"))
#> Based on your query parameters, there are 81 available time series in 42 datasets
```

looks like we get a few extra datasets with this approach

Now that we know how to use our filters, let’s go for a strict filter

We’ll look for marine archives in the northeast Pacific, with interpretations related to marine climate variables in the summer months only

```
qt <- queryLipdverse(coord = c(0,90,-180,-110),
archive.type = c("marine", "ocean"),
ocean = TRUE,
interp.details = c("sea@surface", "elNino"),
interp.vars = c("SST", "upwelling", "SSS"),
seasonality = list(list("6", "7", "8"), list("summer"), list("JJA")),
season.not = list(list("annual"), list("December"), list("12","1","2"), list("winter"), list("cold")))
#> Based on your query parameters, there are 7 available time series in 1 datasets
```

To fine-tune your query, set verbose to TRUE to see which parameters have what effect on filtering

We’ll narrow the results further by adding author names to find within the publication info

```
qt <- queryLipdverse(coord = c(0,90,-180,-110),
archive.type = c("marine", "ocean"),
ocean = TRUE,
interp.details = c("sea@surface", "elNino"),
interp.vars = c("SST", "upwelling", "SSS"),
seasonality = list(list("6", "7", "8"), list("summer"), list("JJA")),
season.not = list(list("annual"), list("December"), list("12","1","2"), list("winter"), list("cold")),
pub.info = c("mix", "caissie"),
verbose = TRUE)
#> Series available before filtering: 103920
#>
#> Series remaining after coord filter: 14062
#>
#> Series remaining after marine filter: 1485
#>
#> Series remaining after continent filter: 1485
#>
#> Series remaining after country filter: 1485
#>
#> Series remaining after time filter: 1485
#>
#> Series remaining after paleo.proxy filter: 1485
#>
#> Series remaining after paleo.units filter: 1485
#>
#> Series remaining after archive.type filter: 893
#>
#> Series remaining after variable.name filter: 893
#>
#> Series remaining after interp.vars filter: 893
#>
#> Series remaining after interp.details filter: 76
#>
#> Series remaining after compilation filter: 76
#>
#> Series remaining after seasonality filter(s): 7
#>
#> Series remaining after pub.info filter: 0
#>
#> Based on your query parameters, there are 0 available time series in 0 datasets
```

When you’re satisfied with the query results, we can simply put the
filtered query table into the `readLipd()`

function to
download the datasets

```
D <- readLipd(qt)
#> Error in value[[3L]](cond): Error: get_src_or_dst: Error in if (!dir.exists(path)) {: argument is of length zero
```