In this class, we will go through some basic and essential functions of R.
Download R :https://cran.r-project.org/
Although R contains in itself everything you need, the interface is quite raw and it does not come with text editor.
RStudio is an Integrated development environment (IDE) developed by Posit (formerly known as RStudio Inc.), that makes R nicer/easier to use. It incudes 4 main panels as show in the following figure.
RStudio homepage:https://posit.co/products/open-source/rstudio/
Download RStudio: https://posit.co/download/rstudio-desktop/
A project is a working directory designated with a .RProj file. When you open a project (using File/Open Project in RStudio or by double–clicking on the .Rproj file outside of R), the working directory will automatically be set to the directory that the .RProj file is located in. This is extremely convenient not to get lost and to make sure everyone can use your script as it relies on relative and not absolute paths. It can also be associated with version control.
Open RStudio
File | New Project
Select either ‘New Directory’ or ‘Existing Directory’. This will be the working folder for our course (on your computer)
Locate the project on your computer in a place
(e.g. ’~Desktop/Courses/Biodiversity)
Name your project
Within your project’s folder, create several sub-folders:
data
scripts
results
figures
Open a new file: File | New File | R script
Type: # Day 1: getting started
Save file (in ~/code): “0_Code_Day1.R”
Start coding!
In the next classes, just open the RProject and reopen previous Rscripts or create new Rscripts within the project.
Local environment: where are we? what is in the console? Copy the
following code into your script and execute it line by line by clicking
on run above the upper right corner of the script or press
ctrl+enter. Also try to write something directly into the
console and execute it with enter. This is also possible but not
recommended, because then you cannot save the code and reuse it later.
So always write your code in a script and save it for later.
## [1] "C:/POSTDOC/Teaching/Biodiversity_ecosystem_functioning/2025_2026"
## character(0)
## [1] "a"
The objects are also listed in RStudio under ‘Environment’ in the upper right-hand field.
If you would like to see this tutorial in your viewer within RStudio (saves space on the screen), please execute the following slightly more complicated code (after that it will be easier again):
install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson1.html",
destfile = file.path(dir, "lesson1.html"))
htmlFile <- file.path(dir, "lesson1.html")
rstudioapi::viewer(htmlFile)Now you can easily copy code from the viewer into your script.
Calculating in R (Mathematical Operators)
## [1] 7
## [1] 12
## [1] 3
## [1] 9
## [1] 3
Note that to calculate the square root we used a “basic” R function, sqrt().
Several types of data: vector, data.frame, matrix and list.
There are several data types available in R: numerics, integers (numeric without decimals), characters, factors and logicals.
## [1] "numeric"
## [1] "integer"
## [1] "character"
## [1] "factor"
## [1] "logical"
## [1] NA
Let’s now create vectors. A vector is simply a list of items that are of the same type!
## [1] 1
## num 1
## [1] "numeric"
## [1] 1 3
## num [1:2] 1 3
## [1] 3
## [1] 5
## [1] "character"
Sequences can be created automatically.
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] 1 3 5 7 9 11
b <- seq(1, 11, length.out = 6) # A sequence from 1 to 11 with length 6
vector_1 <- rep(c(1, 2), 3) # repeated c(1, 2) three times
vector_1## [1] 1 2 1 2 1 2
## [1] 1 1 1 2 2 2
You can also calculate with vectors.
## [1] 1 4 9 16 25 36
## [1] 2 3 4 6 7 8
Now, let’s create a matrix with two rows and two
columns.
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
## num [1:2, 1:2] 1 0 0 1
## [1] "matrix" "array"
## [1] 0
## [1] 2 2
## [1] 2
## [1] 2
# add rownames and column names with paste()
rownames(mat) <- paste("row", seq(1:nrow(mat)), sep = "_")
rownames(mat)## [1] "row_1" "row_2"
## col_1 col_2
## row_1 1 0
## row_2 0 1
A data.frame is a table whose columns can contain
different data types. The indexing is done like for a matrix but we can
also directly refer to column names.
dat <- data.frame(ID = vector_2,
number_1 = vector_3/2,
color = c("blue", "red", "green", "yellow", "grey", "black"))
dat## ID number_1 color
## 1 1 1.0 blue
## 2 2 1.5 red
## 3 3 2.0 green
## 4 4 3.0 yellow
## 5 5 3.5 grey
## 6 6 4.0 black
## 'data.frame': 6 obs. of 3 variables:
## $ ID : int 1 2 3 4 5 6
## $ number_1: num 1 1.5 2 3 3.5 4
## $ color : chr "blue" "red" "green" "yellow" ...
## [1] "data.frame"
## [1] "ID" "number_1" "color"
## [1] 1
## [1] 1
## [1] 1.0 1.5 2.0 3.0 3.5 4.0
And finally a list with two elements.
## $l1
## [1] 1 0
##
## $l2
## [1] 0 1
## List of 2
## $ l1: num [1:2] 1 0
## $ l2: num [1:2] 0 1
## [1] "list"
## [1] "l1" "l2"
## [1] 1 0
## $l1
## [1] 1 0
# we can add several formats in our list, for example our data.frame
ex_list[[3]] <- dat
str(ex_list)## List of 3
## $ l1: num [1:2] 1 0
## $ l2: num [1:2] 0 1
## $ :'data.frame': 6 obs. of 3 variables:
## ..$ ID : int [1:6] 1 2 3 4 5 6
## ..$ number_1: num [1:6] 1 1.5 2 3 3.5 4
## ..$ color : chr [1:6] "blue" "red" "green" "yellow" ...
## [1] 3
R can import many data types, including Excel sheets and .csv files.
We here import tropical forest plot data from Barro Colorado Island (BCI) in Panama. Please download the file BCI_Data.csv on Stud-IP and copy it into the data directory in your project folder.
BCI <- read.table("data/BCI_Data.csv",
header = TRUE, # does the file have column names
sep = ",", # what separates the column
row.names = 1) # does the file have row names
class(BCI)## [1] "data.frame"
## [1] 225
## [1] 50
## Plot1 Plot2 Plot3 Plot4 Plot5
## Abarema.macradenium 0 0 0 0 0
## Acacia.melanoceras 0 0 0 0 0
## Acalypha.diversifolia 0 0 0 0 0
## Acalypha.macrostachya 0 0 0 0 0
## Adelia.triloba 0 0 0 3 1
In this sub-section, we will apply several basic functions on the first community.
## [1] 0 2 25 1 13 6 4 5 12 8 3 14 10 21 7 22 24 11 15 9 18 17
## [1] 0
## [1] 25
## [1] 0 25
## [1] 1.991111
## [1] 0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 1.991 2.000 25.000
## [1] 448
Logical operators can be used to select parts of data according to
certain criteria: which() and operators ==,
!=, >=, >,
<, <=, %in%,
is.na(), is.finite(), !,
&, |
# Extracts all species that appear in plot 1
species_plot1 <- row.names(BCI[which(BCI$Plot1 > 0), ])
length(species_plot1)## [1] 93
## [1] "Alchornea.costaricensis" "Alseis.blackiana"
## [3] "Annona.spraguei" "Apeiba.aspera"
## [5] "Apeiba.tibourbou"
# Extracts all species that appear in Plot 1 and Plot 2
species_plot1and2 <- row.names(BCI[which(BCI$Plot1 > 0 & BCI$Plot2 > 0), ])
length(species_plot1and2)## [1] 64
# Extracts all species that appear in Plot 1 or Plot 2
species_plot1or2 <- row.names(BCI[which(BCI$Plot1 > 0 | BCI$Plot2 > 0), ])
length(species_plot1or2)## [1] 113
ind_plot <- colSums(BCI) # Totals all values per column
ind_sp <- rowSums(BCI) # Totals all values per row
mean_ind_plot <- apply(BCI, 1, mean) # Applies the mean function to each row
mean_ind_sp <- apply(BCI, 2, mean) # Applies the function mean to each columnMany elegant functions for aggregating information in a data.frame
can be found in the package dplyr.
One tremendous advantage of R over other programming languages is its ability to produce nice graphs, ranging from scatterplots to maps.
The R Graph Gallery website shows all the types of plots that can be produced, with associated code.
A boxplot is a standardized way of displaying the
dataset based on a five-number summary: the minimum, the maximum, the
sample median, and the first and third quartiles.
A scatterplot opposes two axes: the abcissa (x-axis) and the ordinate (y-axis).
plot(BCI[, 1], BCI[, 2],
main = "Comparison of sites 1 and 2", # plot title
xlab = "Species in site 1", # label of x-axis
ylab = "Species in site 2", # label of y-axis
pch = 16, # type of dots
col = "dodgerblue") # plot titleA histogram describes the frequency of a variable.
## Faramea.occidentalis Trichilia.tuberculata
## 1717 1681
## Alseis.blackiana Gustavia.superba Hirtella.triandra
## 983 644 681
## Oenocarpus.mapora Poulsenia.armata Quararibea.asterolepis
## 788 755 724
## Virola.sebifera
## 617
Sometimes, you need to repeat an operation many times and there is no
dedicated function for what you want to achieve. There are two main ways
to do that: for-loops and functions.
for (i in 1:10){
print(i) # print value i to console
Sys.sleep(1) # one second pause between each iteration
}## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
# Find the most common species per plot
species <- NA # empty vector to store the results
for (i in 1:ncol(BCI)){
species[i] <- row.names(BCI)[which.max(BCI[, i])]
}
head(species)## [1] "Alseis.blackiana" "Faramea.occidentalis" "Faramea.occidentalis"
## [4] "Faramea.occidentalis" "Socratea.exorrhiza" "Socratea.exorrhiza"
for-loops are very flexible but also relatively
slow.
myFunction <- function(x){
if (x > 5){
message("Hello")
} else{
message("Goodbye")}
}
myFunction(6)
myFunction(5)
# Function to find the most common species per plot
common_species <- function(plot){
row.names(BCI)[which.max(BCI[, plot])]
}
# Most common species in plot 30?
common_species(30)## [1] "Faramea.occidentalis"
If we want to apply our function to all the plots of BCI matrix, we
can use sapply() function which is part of the
apply family of functions.
species_functionapplied <- sapply(1:ncol(BCI), # vector on which we repeat the function
common_species) # function we apply
head(species_functionapplied)## [1] "Alseis.blackiana" "Faramea.occidentalis" "Faramea.occidentalis"
## [4] "Faramea.occidentalis" "Socratea.exorrhiza" "Socratea.exorrhiza"
## [1] TRUE
R is the perfect tool to design complex statistical models. We here just show how to perform a linear model.
We here import a dataset about chicken weights, that freely comes with R. With this dataset, we can see the link between the weight of young chicken and their diet across time.
## weight Time Chick Diet
## 1 42 0 1 1
## 2 51 2 1 1
## 3 59 4 1 1
## 4 64 6 1 1
## 5 76 8 1 1
## 6 93 10 1 1
## weight Time Chick Diet
## Min. : 35.0 Min. : 0.00 13 : 12 1:220
## 1st Qu.: 63.0 1st Qu.: 4.00 9 : 12 2:120
## Median :103.0 Median :10.00 20 : 12 3:120
## Mean :121.8 Mean :10.72 10 : 12 4:118
## 3rd Qu.:163.8 3rd Qu.:16.00 17 : 12
## Max. :373.0 Max. :21.00 19 : 12
## (Other):506
## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 578 obs. of 4 variables:
## $ weight: num 42 51 59 64 76 93 106 125 149 171 ...
## $ Time : num 0 2 4 6 8 10 12 14 16 18 ...
## $ Chick : Ord.factor w/ 50 levels "18"<"16"<"15"<..: 15 15 15 15 15 15 15 15 15 15 ...
## $ Diet : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "formula")=Class 'formula' language weight ~ Time | Chick
## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## - attr(*, "outer")=Class 'formula' language ~Diet
## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## - attr(*, "labels")=List of 2
## ..$ x: chr "Time"
## ..$ y: chr "Body weight"
## - attr(*, "units")=List of 2
## ..$ x: chr "(days)"
## ..$ y: chr "(gm)"
##
## Call:
## lm(formula = weight ~ Time, data = ChickWeight)
##
## Coefficients:
## (Intercept) Time
## 27.467 8.803
# store the model in an object
weight_mod <- lm(weight ~ Time, data = ChickWeight)
# summary of the model
summary(weight_mod)##
## Call:
## lm(formula = weight ~ Time, data = ChickWeight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -138.331 -14.536 0.926 13.533 160.669
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.4674 3.0365 9.046 <2e-16 ***
## Time 8.8030 0.2397 36.725 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 38.91 on 576 degrees of freedom
## Multiple R-squared: 0.7007, Adjusted R-squared: 0.7002
## F-statistic: 1349 on 1 and 576 DF, p-value: < 2.2e-16
# Plot
plot(ChickWeight$Time, ChickWeight$weight, pch = 16)
abline(weight_mod, col = "red", lwd = 2) # add regression line on the plot# Models with Diet as a co-variable
weight_mod2 <- lm(weight ~ Time + Diet, data = ChickWeight)
summary(weight_mod2)##
## Call:
## lm(formula = weight ~ Time + Diet, data = ChickWeight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -136.851 -17.151 -2.595 15.033 141.816
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.9244 3.3607 3.251 0.00122 **
## Time 8.7505 0.2218 39.451 < 2e-16 ***
## Diet2 16.1661 4.0858 3.957 8.56e-05 ***
## Diet3 36.4994 4.0858 8.933 < 2e-16 ***
## Diet4 30.2335 4.1075 7.361 6.39e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 35.99 on 573 degrees of freedom
## Multiple R-squared: 0.7453, Adjusted R-squared: 0.7435
## F-statistic: 419.2 on 4 and 573 DF, p-value: < 2.2e-16
# Model predictions
newtime <- seq(min(ChickWeight$Time), max(ChickWeight$Time), length.out = 100)
pred_1 <- predict(
weight_mod2,
newdata = data.frame(Time = newtime,
Diet = factor("1", levels = c("1", "2", "3", "4"))))
pred_2 <- predict(
weight_mod2,
newdata = data.frame(Time = newtime,
Diet = factor("2", levels = c("1", "2", "3", "4"))))
pred_3 <- predict(
weight_mod2,
newdata = data.frame(Time = newtime,
Diet = factor("3", levels = c("1", "2", "3", "4"))))
pred_4 <- predict(
weight_mod2,
newdata = data.frame(Time = newtime,
Diet = factor("4", levels = c("1", "2", "3", "4"))))
plot(ChickWeight$Time, ChickWeight$weight, col = ChickWeight$Diet, pch = 16,
cex = 0.5)
lines(newtime, pred_1, col = 1, lwd = 2) # add prediction line for Diet1
lines(newtime, pred_2, col = 2, lwd = 2) # add prediction line for Diet2
lines(newtime, pred_3, col = 3, lwd = 2) # add prediction line for Diet3
lines(newtime, pred_4, col = 4, lwd = 2) # add prediction line for Diet4For every linear model, we should inspect its residuals in order to meet if any assumption is violated.
# Check residuals of the model
par(mfrow = c(2, 2))
plot(weight_mod2, which = 1)
plot(weight_mod2, which = 2)
plot(weight_mod2, which = 3)
plot(weight_mod2, which = 4)As the residuals show some trends, the model should be improved.
Some R packages can produce interactive plots. We quickly illustrate
how you can produce an interactive plots using both ggplot2
and plotly R packages. We here plot the number of
individuals versus the species richness of the BCI plot data. Further
details will be provided in the next practicals.
# install.packages() installs packages that you don't have yet on your computer
# install.packages("ggplot2"); install.packages("plotly")
# library() loads locally installed packages into your R console
library(ggplot2)
library(plotly)
# 1/0 matrix
BCI_bin <- BCI
BCI_bin[BCI_bin > 0] <- 1
pot <- data.frame(plot = colnames(BCI),
nb_ind = colSums(BCI),
rich = colSums(BCI_bin))
int_plot <- ggplot(pot, aes(nb_ind, rich)) +
geom_point(aes(color = plot)) +
scale_color_viridis_d("Plot") +
labs(x = "Number of individuals", y = "Species richness") +
theme_classic()
ggplotly(int_plot)R Reference card https://cran.r-project.org/doc/contrib/Short-refcard.pdf
Cookbook for R http://www.cookbook-r.com/
Stackoverflow http://www.stackoverflow.com
Quick-R https://www.statmethods.net/
or do an online search (Google, DuckDuckGo) :
‘R package/function name error message’
‘R how to remove every third line from a data.frame’
Look for local or university user groups (via Slack,Twitter, etc.)
R for Data Science (by Hadley Wickham) http://r4ds.had.co.nz/
Coding Club https://ourcodingclub.github.io/tutorials/
Basics and a lot more http://ohi-science.org/news/Resources-for-R-and-Data-Science
Guide for reproducible code https://www.britishecologicalsociety.org/wp-content/uploads/2017/12/guide-to-reproducible-code.pdf