LiPD utilities for R • lipdR

Welcome to the lipdR package, a set of LiPD Utilities in R . This guide will provide everything you need to get up and running with the LiPD Utilities in R, and show you how to use the core functions in the LiPD package.

Features

Read & write LiPD files
Extract & collapse a time series for data analysis
Filter & query a time series for subset data

Requirements

R, the language, is availalble from CRAN. R Studio is an IDE that will work using R and make your workflow much easier. R Tools is required for Windows users. Software version numbers are listed at the top of this README file. If you do not have any datasets in LiPD format yet, feel free to experiment with the one below.

https://cran.r-project.org

R Studio

https://www.rstudio.com

R Tools (Windows Users)

https://cran.r-project.org/bin/windows/Rtools/

LiPD file (for test purposes)

ODP1098B13.lpd

Installation

Create a new project in R Studio and start with a fresh workspace:

Workspace

Install remotes in the console window:

install.packages("remotes")

Load the remotes package:

library(remotes)

Use remotes to install the LiPD Utilities package from github:

remotes::install_github("nickmckay/lipdR")

Load the lipdR package:

library(lipdR)

And that’s it! You have successfully installed the LiPD utilities and are ready to start working. Your console should now look similar to this:

Core functions

Notation:

This guide uses notation that may be new to you. In case you are unfamiliar with these terms, the list below provides an explanation for each. Please feel free to name your own variables as you move through the guide.

D

Represents multiple datasets read into a single variable. Each dataset is organized by its dataset name. D[["ODP1098B13"]][["paleoData"]] or D$ODP1098B13$paleoData

L

Represents a single dataset. The dataset does not need to be organized by name. L[["paleoData"]] or L$paleoData

ts

A time series. The ts notation is used both in variable names and time series related functions, like extractTs

readLipd(path = ““)

Purpose: Reads LiPD files from a source path into the environment. Read in a single file, or a directory of multiple files.

Parameters: > path (optional) > > The path to the locally stored file or directory that you would like to read into the workspace > >Example: Provide a path to a file > > L = readLipd("/Users/bobsmith/Downloads/filename.lpd") > >Example: Provide a path to a directory > > D = readLipd("/Users/bobsmith/Desktop") > > Example: Browse for file or directory > > D = readLipd()

Returns: > > D > > Multiple datasets D or one dataset L >

Example 1: Browse for file

Call readLipd as shown below:

readlipd_browse_file_prompt

Leave the path empty in this example. A prompt will ask you to choose to read a single file or a directory. Choose s and read a single file:

readlipd_browse_file_dialog

A browse dialog opens and asks you to choose the LiPD file that you want. Here, I have selected the file and clicked “Open”:

readlipd_browse_file_done

The console shows the name of the current file being read. When the file is finished reading, the > indicator appears again and the process is finished. (shown on the left)

The LiPD file loads into the environment under variable L . The environment L variable allows you to preview some of the LiPD data with the dropdown arrow (shown on the right).

readlipd_browse_file_L

A quick look at L shows that the data is at the root of the variable, as expected.

Example 2: Browse for directory

NOTE:

Reading a directory is most commonly used for reading multiple files. I have added LiPD files to my source folder and will load them into variable D.

Call readLipd as shown below:

readlipd_browse_dir_prompt

Leave the path empty in this example. A prompt will ask you to choose to read a single file or a directory. Choose d and read a directory:

readlipd_browse_dir_dialog

NOTE:

Due to a bug in R, we are not able to use the module for choosing a directory with the GUI. It causes R Studio to crash and that’s not an experience we want to give you. The instructions below are a workaround that will provide the same result.

A browse dialog opens. Please choose any LiPD file within the directory that you want. For example, I want to load all the LiPD files in the quickstart directory, so I will choose the ODP1098B13.lpd. Choosing either of the other two LiPD files has the same outcome.

readlipd_browse_dir_done

The console shows that 3 files have been read, and processing is finished.

The LiPD files load into the environment under variable D. The D variable shows that it is a list of 3, which matches the number of LiPD files in the source directory. Success!

readlipd_browse_dir_D

A quick look at D shows that the datasets are sorted by dataset names, as expected. If you look one more level down, we can see the data.

readlipd_browse_dir_D2

REMEMBER

Since D contains multiple datasets, we organize the data by dataSetName. Since L only holds one dataset, we do not use this dataSetName layer, and instead link directly to the data.

Example 3: Provide a path

If you have the path to a specific file or directory that you would like to read, you can read in data in less steps. I’ll use a path to a file on my desktop.

readlipd_path_file

NOTE:

Relative paths do not work in R. If the file you want to read is located in your ‘current working directory’ (use getwd()) then you can load it directly using the filename.

readLipd("ODP1098B13.lpd")

If a file is not in your current working directory, then you must give an explicit path to the file.

readLipd("/Users/bobsmith/Desktop/ODP1098B13.lpd") or readLipd("~/Desktop/ODP1098B13.lpd")

writeLipd(D, path = ““)

Purpose: Writes LiPD data from the environment as a LiPD file.

Parameters: > D > > Multiple datasets D or one dataset L > —- > path (optional) > > The directory path that you would like to write the LiPD file(s) to. > > Provide a destination directory path: >
> writeLipd(D, "/Users/bobsmith/Desktop") > > Or, omit the path to browse for a destination: > > writeLipd(D)

Returns: > > This function does not return data >

Call writeLipd as shown below. Pass your LiPD data. In this case, I pass L to the function, which represents one LiPD dataset. writelipd_file_call

writelipd_file_choose

A dialog opens and asks you to choose a directory. Choose a file within the directory that you to write to. (Reference Example 2 for readLipd for further explanation)

writelipd_file_done

The console window will show each data file as it is compressed into the LiPD file being written. The list should contain four .txt, at least one .csv file, and one .jsonld file.

extractTs(D)

Purpose: Creates a time series from LiPD datasets. What is a time series?

Parameters: > > D > > Multiple datasets D or one dataset L >

Returns: > > ts > > A time series >

Call extractTs as shown below:

extractts_call

The time series is created and placed in the ts variable. Click the arrow next to the ts variable in the environment to see the what the contents look like.

extractts_done

collapseTs(ts)

Purpose: Collapse a time series back into LiPD datasets. This function is lossless and will return the data back to its original form. If you made and changes or edits to the time series, they will persist. (This is the opposite function of extractTs) What is a time series?

Parameters: > ts > > A time series >

Returns: > > D > > Multiple datasets D or one dataset L >

Call collapseTs as shown below:

collapsets_start

The goal of collapseTs is to recreate the same data (without losing anything) that you had before calling extractTs. This is most useful if you have edited the time series in some way.

collapsets_done

In this example, L represents your original dataset, ts represents the time series, and L2 represents the new dataset. Note how the number of elements, and the size of L and L2 are equal.

collapsets_expand

Expanding L2 in the environment shows that the data has returned to the LiPD dataset hierarchy as before.

Help

What is a time series?

The LiPD dataset hierarchy is great for organization and giving context to data, but can be more difficult to sift through to find relevant information since it can often go 10+ levels deep.

A time series is a flattened set of data that makes data more approachable and is used to perform data analysis. A time series is a collection of time series objects.

1-to-1 ratio 1 time series object = 1 measurement table column

Each object within a time series is made from one column of data in a measurement table. It’s important to note that this only pertains to measurement table data. All model data (ensemble, distribution, summary) are not included when creating a time series.

Example 1: One dataset

ODP1098B13
- 1 measurement table
  - 5 columns
    - depth, depth1, SST, TEX86, age

extractTs creates a time series (ts) of 5 objects

Example 2: Multiple datasets

ODP1098B13
- 1 measurement table
  - 5 columns
    - depth, depth1, SST, TEX86, age
Ant-CoastalDML.Thamban.2006
- 1 measurement table
  - 2 columns
    - d18O, year
CO00COKY
- 1 measurement table
  - 2 columns
    - d18O, year

extractTs creates a time series (ts) of 9 objects

How are queryTs and filterTs different?

It’s easy to confuse these two functions as they are almost identical in purpose. Here’s what you need to know:

queryTs:

This function returns the index numbers of objects that match your expression.

filterTs:

This function returns the actual data of objects that match your expression.

## How to Cite this code

Use this link to visit the Zenodo website. It provides citation information in many popular formats.

footer NSF

lipdR