Welcome to the lipdR package, a set of LiPD Utilities in R . This guide will provide everything you need to get up and running with the LiPD Utilities in R, and show you how to use the core functions in the LiPD package.
R, the language, is availalble from CRAN. R Studio is an IDE that will work using R and make your workflow much easier. R Tools is required for Windows users. Software version numbers are listed at the top of this README file. If you do not have any datasets in LiPD format yet, feel free to experiment with the one below.
R
R Studio
R Tools (Windows Users)
https://cran.r-project.org/bin/windows/Rtools/
LiPD file (for test purposes)
Create a new project in R Studio and start with a fresh workspace:
Install remotes in the console window:
Load the remotes package:
Use remotes to install the LiPD Utilities package from github:
Load the lipdR package:
And that’s it! You have successfully installed the LiPD utilities and are ready to start working. Your console should now look similar to this:
Notation:
This guide uses notation that may be new to you. In case you are unfamiliar with these terms, the list below provides an explanation for each. Please feel free to name your own variables as you move through the guide.
D
Represents multiple datasets read into a single variable. Each dataset is organized by its dataset name.
D[["ODP1098B13"]][["paleoData"]]
orD$ODP1098B13$paleoData
L
Represents a single dataset. The dataset does not need to be organized by name.
L[["paleoData"]]
orL$paleoData
ts
A time series. The
ts
notation is used both in variable names and time series related functions, likeextractTs
Purpose: Reads LiPD files from a source path into the environment. Read in a single file, or a directory of multiple files.
Parameters: > path (optional) > > The path to the locally stored file or directory that you would like to read into the workspace > >Example: Provide a path to a file > > L = readLipd("/Users/bobsmith/Downloads/filename.lpd")
> >Example: Provide a path to a directory > > D = readLipd("/Users/bobsmith/Desktop")
> > Example: Browse for file or directory > > D = readLipd()
Returns: > > D > > Multiple datasets D
or one dataset L
>
Example 1: Browse for file
Call readLipd
as shown below:
Leave the path empty in this example. A prompt will ask you to choose to read a single file
or a directory
. Choose s
and read a single file:
A browse dialog opens and asks you to choose the LiPD file that you want. Here, I have selected the file and clicked “Open”:
The console shows the name of the current file being read. When the file is finished reading, the >
indicator appears again and the process is finished. (shown on the left)
The LiPD file loads into the environment under variable L
. The environment L
variable allows you to preview some of the LiPD data with the dropdown arrow (shown on the right).
A quick look at L
shows that the data is at the root of the variable, as expected.
Example 2: Browse for directory
NOTE:
Reading a directory is most commonly used for reading multiple files. I have added LiPD files to my source folder and will load them into variable
D
.
Call readLipd
as shown below:
Leave the path empty in this example. A prompt will ask you to choose to read a single file
or a directory
. Choose d
and read a directory:
NOTE:
Due to a bug in R, we are not able to use the module for choosing a directory with the GUI. It causes R Studio to crash and that’s not an experience we want to give you. The instructions below are a workaround that will provide the same result.
A browse dialog opens. Please choose any LiPD file within the directory that you want. For example, I want to load all the LiPD files in the quickstart
directory, so I will choose the ODP1098B13.lpd
. Choosing either of the other two LiPD files has the same outcome.
The console shows that 3 files have been read, and processing is finished.
The LiPD files load into the environment under variable D
. The D
variable shows that it is a list of 3
, which matches the number of LiPD files in the source directory. Success!
A quick look at D
shows that the datasets are sorted by dataset names, as expected. If you look one more level down, we can see the data.
REMEMBER
Since
D
contains multiple datasets, we organize the data bydataSetName
. SinceL
only holds one dataset, we do not use thisdataSetName
layer, and instead link directly to the data.
Example 3: Provide a path
If you have the path to a specific file or directory that you would like to read, you can read in data in less steps. I’ll use a path to a file on my desktop.
NOTE:
Relative paths do not work in R. If the file you want to read is located in your ‘current working directory’ (use
getwd()
) then you can load it directly using the filename.
readLipd("ODP1098B13.lpd")
If a file is not in your current working directory, then you must give an explicit path to the file.
readLipd("/Users/bobsmith/Desktop/ODP1098B13.lpd")
orreadLipd("~/Desktop/ODP1098B13.lpd")
Purpose: Writes LiPD data from the environment as a LiPD file.
Parameters: > D > > Multiple datasets D
or one dataset L
> —- > path (optional) > > The directory path that you would like to write the LiPD file(s) to. > > Provide a destination directory path: >
> writeLipd(D, "/Users/bobsmith/Desktop")
> > Or, omit the path to browse for a destination: > > writeLipd(D)
Returns: > > This function does not return data >
Call writeLipd
as shown below. Pass your LiPD data. In this case, I pass L
to the function, which represents one LiPD dataset.
A dialog opens and asks you to choose a directory. Choose a file within the directory that you to write to. (Reference Example 2 for readLipd
for further explanation)
The console window will show each data file as it is compressed into the LiPD file being written. The list should contain four .txt
, at least one .csv
file, and one .jsonld
file.
Purpose: Creates a time series from LiPD datasets. What is a time series?
Parameters: > > D > > Multiple datasets D
or one dataset L
>
Returns: > > ts > > A time series >
Call extractTs
as shown below:
The time series is created and placed in the ts
variable. Click the arrow next to the ts
variable in the environment to see the what the contents look like.
Purpose: Collapse a time series back into LiPD datasets. This function is lossless and will return the data back to its original form. If you made and changes or edits to the time series, they will persist. (This is the opposite function of extractTs
) What is a time series?
Parameters: > ts > > A time series >
Returns: > > D > > Multiple datasets D
or one dataset L
>
Call collapseTs
as shown below:
The goal of collapseTs
is to recreate the same data (without losing anything) that you had before calling extractTs
. This is most useful if you have edited the time series in some way.
In this example, L
represents your original dataset, ts
represents the time series, and L2
represents the new dataset. Note how the number of elements, and the size of L
and L2
are equal.
Expanding L2
in the environment shows that the data has returned to the LiPD dataset hierarchy as before.
The LiPD dataset hierarchy is great for organization and giving context to data, but can be more difficult to sift through to find relevant information since it can often go 10+ levels deep.
A time series is a flattened set of data that makes data more approachable and is used to perform data analysis. A time series is a collection of time series objects.
1-to-1 ratio 1 time series object = 1 measurement table column
Each object within a time series is made from one column of data in a measurement table. It’s important to note that this only pertains to measurement table data. All model data (ensemble, distribution, summary) are not included when creating a time series.
Example 1: One dataset
extractTs
creates a time series (ts
) of 5 objects
Example 2: Multiple datasets
ODP1098B13
Ant-CoastalDML.Thamban.2006
CO00COKY
extractTs
creates a time series (ts
) of 9 objects
It’s easy to confuse these two functions as they are almost identical in purpose. Here’s what you need to know:
queryTs
:
This function returns the index numbers of objects that match your expression.
filterTs
:
This function returns the actual data of objects that match your expression.