Back to index

In this class, we will go through some basic and essential functions of R.

1. Starting with R & RStudio

1.1. What is R?

  • Statistical computing environment
  • Basic statistics (e.g. ANOVA, T-tests)
  • Complex modeling (machine learning, Bayesian stats, simulations…)
  • Database manager
  • Graphic design
  • GIS
  • Version control (via Github, etc.)
  • Publisher (basic word processing, web pages, scientific articles, theses, books…)

Download R :https://cran.r-project.org/

1.2. Why R?

  • Most frequent coding language in ecology/biodiversity analyses
  • Transparency/Open science
  • Reproducibility
  • Marketable skill (outside of science too!)
  • Large online user community for help (almost every question you’ll have has already been answered!)

1.3. RStudio (IDE of R)

Although R contains in itself everything you need, the interface is quite raw and it does not come with text editor.

RStudio is an Integrated development environment (IDE) developed by Posit (formerly known as RStudio Inc.), that makes R nicer/easier to use. It incudes 4 main panels as show in the following figure.

RStudio homepage:https://posit.co/products/open-source/rstudio/

Download RStudio: https://posit.co/download/rstudio-desktop/

1.4. Create a project

A project is a working directory designated with a .RProj file. When you open a project (using File/Open Project in RStudio or by double–clicking on the .Rproj file outside of R), the working directory will automatically be set to the directory that the .RProj file is located in. This is extremely convenient not to get lost and to make sure everyone can use your script as it relies on relative and not absolute paths. It can also be associated with version control.

  • Open RStudio

  • File | New Project

  • Select either ‘New Directory’ or ‘Existing Directory’. This will be the working folder for our course (on your computer)

  • Locate the project on your computer in a place (e.g. ’~Desktop/Courses/Biodiversity)

  • Name your project

  • Within your project’s folder, create several sub-folders:

  • data

  • scripts

  • results

  • figures

  • Open a new file: File | New File | R script

  • Type: # Day 1: getting started

  • Save file (in ~/code): “0_Code_Day1.R”

  • Start coding!

  • In the next classes, just open the RProject and reopen previous Rscripts or create new Rscripts within the project.

2. Presentation

2.1. Local environment

Local environment: where are we? what is in the console? Copy the following code into your script and execute it line by line by clicking on run above the upper right corner of the script or press ctrl+enter. Also try to write something directly into the console and execute it with enter. This is also possible but not recommended, because then you cannot save the code and reuse it later. So always write your code in a script and save it for later.

# Current directory
getwd() # root of the R project
## [1] "C:/POSTDOC/Teaching/Biodiversity_ecosystem_functioning/2025_2026"
# to change it
# setwd("C:/new_directory")

# Check the objects loaded in the local environment
ls()
## character(0)
# Creating an object a
a <- 1
ls()
## [1] "a"

The objects are also listed in RStudio under ‘Environment’ in the upper right-hand field.

If you would like to see this tutorial in your viewer within RStudio (saves space on the screen), please execute the following slightly more complicated code (after that it will be easier again):

install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
              destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson1.html",
              destfile = file.path(dir, "lesson1.html"))
htmlFile <- file.path(dir, "lesson1.html")
rstudioapi::viewer(htmlFile)

Now you can easily copy code from the viewer into your script.

2.2. Basic operations

Calculating in R (Mathematical Operators)

3+4 # Addition
## [1] 7
3*4 # Multiplication
## [1] 12
12/4 # Division
## [1] 3
3^2 # Power
## [1] 9
sqrt(9) # Square root
## [1] 3

Note that to calculate the square root we used a “basic” R function, sqrt().

2.3. Help

There are help functions to know what arguments/outputs a given function needs and returns.

?sqrt # Opens the help for a function in the lower right field in Rstudio or in the browser
??squareroot # searches for functions matching the keyword

3. Data creation and description

3.1. Data creation

Several types of data: vector, data.frame, matrix and list.

3.1.0. Types

There are several data types available in R: numerics, integers (numeric without decimals), characters, factors and logicals.

class(1)
## [1] "numeric"
class(1L) # (the L tells R to store this as an integer)
## [1] "integer"
class("a")
## [1] "character"
class(as.factor("a"))
## [1] "factor"
class(TRUE)
## [1] "logical"
# Missing value: NA
NA
## [1] NA

3.1.1. Vectors

Let’s now create vectors. A vector is simply a list of items that are of the same type!

a <- 1
a
## [1] 1
str(a) # str() returns information about an object
##  num 1
class(a) # class() returns the class of an object
## [1] "numeric"
a <- c(1, 3) # c() for a vector
a; str(a)
## [1] 1 3
##  num [1:2] 1 3
a[2] # second value of our vector
## [1] 3
a[2] <- 5 #  changes the second value to 5
a[2]
## [1] 5
# Vectors can also contain characters
b <- "Fagus"
class(b)
## [1] "character"

Sequences can be created automatically.

a <- c(1:10) # generates a sequence from 1 to 10
a
##  [1]  1  2  3  4  5  6  7  8  9 10
b <- seq(1, 12, by = 2) # A sequence from 1 to 12 in steps of 2
b
## [1]  1  3  5  7  9 11
b <- seq(1, 11, length.out = 6) # A sequence from 1 to 11 with length 6

vector_1 <- rep(c(1, 2), 3) # repeated c(1, 2) three times
vector_1
## [1] 1 2 1 2 1 2
vector_1 <- rep(c(1, 2), each = 3) # repeats each element c(1, 2) three times
vector_1
## [1] 1 1 1 2 2 2

You can also calculate with vectors.

vector_2 <- c(1:6)
vector_2^2
## [1]  1  4  9 16 25 36
vector_3 <- vector_1 + vector_2
vector_3
## [1] 2 3 4 6 7 8

3.1.2. Matrices

Now, let’s create a matrix with two rows and two columns.

mat <- matrix(data = c(1, 0, 0, 1), nrow = 2, ncol = 2)
mat
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1
str(mat)
##  num [1:2, 1:2] 1 0 0 1
class(mat)
## [1] "matrix" "array"
mat[1, 2] # first row and second column
## [1] 0
# dimension of the object
dim(mat)
## [1] 2 2
# number of rows and columns
nrow(mat); ncol(mat)
## [1] 2
## [1] 2
# add rownames and column names with paste()
rownames(mat) <- paste("row", seq(1:nrow(mat)), sep = "_")
rownames(mat)
## [1] "row_1" "row_2"
colnames(mat) <- paste("col", seq(1:ncol(mat)), sep = "_")
mat
##       col_1 col_2
## row_1     1     0
## row_2     0     1

3.1.3. Data.frames

A data.frame is a table whose columns can contain different data types. The indexing is done like for a matrix but we can also directly refer to column names.

dat <- data.frame(ID = vector_2,
                  number_1 = vector_3/2, 
                  color = c("blue", "red", "green", "yellow", "grey", "black"))
dat
##   ID number_1  color
## 1  1      1.0   blue
## 2  2      1.5    red
## 3  3      2.0  green
## 4  4      3.0 yellow
## 5  5      3.5   grey
## 6  6      4.0  black
str(dat)
## 'data.frame':    6 obs. of  3 variables:
##  $ ID      : int  1 2 3 4 5 6
##  $ number_1: num  1 1.5 2 3 3.5 4
##  $ color   : chr  "blue" "red" "green" "yellow" ...
class(dat)
## [1] "data.frame"
colnames(dat) # returns the column names of our data.frame
## [1] "ID"       "number_1" "color"
dat[1, 2] # first row and second column
## [1] 1
dat[1, "number_1"]
## [1] 1
dat$number_1 # $ to get the values inside a column
## [1] 1.0 1.5 2.0 3.0 3.5 4.0

3.1.4. Lists

And finally a list with two elements.

ex_list <- list(l1 = c(1, 0),
                l2 = c(0, 1))
ex_list
## $l1
## [1] 1 0
## 
## $l2
## [1] 0 1
str(ex_list)
## List of 2
##  $ l1: num [1:2] 1 0
##  $ l2: num [1:2] 0 1
class(ex_list)
## [1] "list"
names(ex_list)
## [1] "l1" "l2"
# To access a list element, use [[]]
ex_list[[1]]
## [1] 1 0
# or by name
ex_list["l1"]
## $l1
## [1] 1 0
# we can add several formats in our list, for example our data.frame
ex_list[[3]] <- dat
str(ex_list)
## List of 3
##  $ l1: num [1:2] 1 0
##  $ l2: num [1:2] 0 1
##  $   :'data.frame':  6 obs. of  3 variables:
##   ..$ ID      : int [1:6] 1 2 3 4 5 6
##   ..$ number_1: num [1:6] 1 1.5 2 3 3.5 4
##   ..$ color   : chr [1:6] "blue" "red" "green" "yellow" ...
length(ex_list) # number of list elements
## [1] 3

3.2. Import data

R can import many data types, including Excel sheets and .csv files.

We here import tropical forest plot data from Barro Colorado Island (BCI) in Panama. Please download the file BCI_Data.csv on Stud-IP and copy it into the data directory in your project folder.

BCI <- read.table("data/BCI_Data.csv",
                  header = TRUE, # does the file have column names
                  sep = ",", # what separates the column
                  row.names = 1) # does the file have row names

class(BCI)
## [1] "data.frame"
nrow(BCI)
## [1] 225
ncol(BCI)
## [1] 50
BCI[1:5, 1:5]
##                       Plot1 Plot2 Plot3 Plot4 Plot5
## Abarema.macradenium       0     0     0     0     0
## Acacia.melanoceras        0     0     0     0     0
## Acalypha.diversifolia     0     0     0     0     0
## Acalypha.macrostachya     0     0     0     0     0
## Adelia.triloba            0     0     0     3     1

3.3. Summary statistics

In this sub-section, we will apply several basic functions on the first community.

com1 <- BCI[, 1] # Extract first community

unique(com1) # Unique values
##  [1]  0  2 25  1 13  6  4  5 12  8  3 14 10 21  7 22 24 11 15  9 18 17
min(com1) # Minimum value
## [1] 0
max(com1) # Maximum value
## [1] 25
range(com1) # Min and Max
## [1]  0 25
mean(com1) # Mean
## [1] 1.991111
median(com1) # Median
## [1] 0
summary(com1) # summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   1.991   2.000  25.000
sum(com1) # summing all individuals
## [1] 448

3.4. Logical operators

Logical operators can be used to select parts of data according to certain criteria: which() and operators ==, !=, >=, >, <, <=, %in%, is.na(), is.finite(), !, &, |

# Extracts all species that appear in plot 1
species_plot1 <- row.names(BCI[which(BCI$Plot1 > 0), ])
length(species_plot1)
## [1] 93
species_plot1[1:5]
## [1] "Alchornea.costaricensis" "Alseis.blackiana"       
## [3] "Annona.spraguei"         "Apeiba.aspera"          
## [5] "Apeiba.tibourbou"
# Extracts all species that appear in Plot 1 and Plot 2
species_plot1and2 <- row.names(BCI[which(BCI$Plot1 > 0 & BCI$Plot2 > 0), ]) 
length(species_plot1and2)
## [1] 64
# Extracts all species that appear in Plot 1 or Plot 2
species_plot1or2 <- row.names(BCI[which(BCI$Plot1 > 0 | BCI$Plot2 > 0), ]) 
length(species_plot1or2)
## [1] 113

3.5. Aggregate information

ind_plot <- colSums(BCI) # Totals all values per column 
ind_sp <- rowSums(BCI) # Totals all values per row

mean_ind_plot <- apply(BCI, 1, mean) # Applies the mean function to each row
mean_ind_sp <- apply(BCI, 2, mean) # Applies the function mean to each column

Many elegant functions for aggregating information in a data.frame can be found in the package dplyr.

4. Plotting

One tremendous advantage of R over other programming languages is its ability to produce nice graphs, ranging from scatterplots to maps.

The R Graph Gallery website shows all the types of plots that can be produced, with associated code.

4.1. Boxplot

Boxplot A boxplot is a standardized way of displaying the dataset based on a five-number summary: the minimum, the maximum, the sample median, and the first and third quartiles.

boxplot(ind_plot,
        main = "Number of individuals per plot") # plot title

# Removing outliers
boxplot(ind_plot, outline = FALSE,
        main = "Number of individuals per species")

XKCD boxplots
XKCD boxplots

4.2. Scatterplot

A scatterplot opposes two axes: the abcissa (x-axis) and the ordinate (y-axis).

plot(BCI[, 1], BCI[, 2],
     main = "Comparison of sites 1 and 2", # plot title
     xlab = "Species in site 1", # label of x-axis
     ylab = "Species in site 2", # label of y-axis
     pch = 16, # type of dots
     col = "dodgerblue") # plot title

4.3. Histograms

A histogram describes the frequency of a variable.

hist(ind_sp, col = "grey",
     xlab = "Number", # x-axis labeling
     main = "Individuals per species") 

ind_sp[ind_sp>1000]
##  Faramea.occidentalis Trichilia.tuberculata 
##                  1717                  1681
ind_sp[ind_sp>500 & ind_sp< 1000]
##       Alseis.blackiana       Gustavia.superba      Hirtella.triandra 
##                    983                    644                    681 
##      Oenocarpus.mapora       Poulsenia.armata Quararibea.asterolepis 
##                    788                    755                    724 
##        Virola.sebifera 
##                    617

5. For-loops and functions

Sometimes, you need to repeat an operation many times and there is no dedicated function for what you want to achieve. There are two main ways to do that: for-loops and functions.

5.1. for-loop

for (i in 1:10){
  print(i) # print value i to console
  Sys.sleep(1) # one second pause between each iteration
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
# Find the most common species per plot
species <- NA # empty vector to store the results
for (i in 1:ncol(BCI)){
  species[i] <- row.names(BCI)[which.max(BCI[, i])]
}
head(species)
## [1] "Alseis.blackiana"     "Faramea.occidentalis" "Faramea.occidentalis"
## [4] "Faramea.occidentalis" "Socratea.exorrhiza"   "Socratea.exorrhiza"

for-loops are very flexible but also relatively slow.

5.2. Create a function

myFunction <- function(x){
  if (x > 5){
    message("Hello")
  } else{
    message("Goodbye")}
}

myFunction(6)
myFunction(5)

# Function to find the most common species per plot
common_species <- function(plot){
  row.names(BCI)[which.max(BCI[, plot])]
}
# Most common species in plot 30?
common_species(30)
## [1] "Faramea.occidentalis"

If we want to apply our function to all the plots of BCI matrix, we can use sapply() function which is part of the apply family of functions.

species_functionapplied <- sapply(1:ncol(BCI), # vector on which we repeat the function
                                  common_species) # function we apply
head(species_functionapplied)
## [1] "Alseis.blackiana"     "Faramea.occidentalis" "Faramea.occidentalis"
## [4] "Faramea.occidentalis" "Socratea.exorrhiza"   "Socratea.exorrhiza"
# comparison for-loop and function
all(species == species_functionapplied)
## [1] TRUE

6. Statistical modelling

R is the perfect tool to design complex statistical models. We here just show how to perform a linear model.

6.1. Linear models

We here import a dataset about chicken weights, that freely comes with R. With this dataset, we can see the link between the weight of young chicken and their diet across time.

data("ChickWeight") # data() imports dataset in R
head(ChickWeight) # first 5 rows of the dataset
##   weight Time Chick Diet
## 1     42    0     1    1
## 2     51    2     1    1
## 3     59    4     1    1
## 4     64    6     1    1
## 5     76    8     1    1
## 6     93   10     1    1
summary(ChickWeight)
##      weight           Time           Chick     Diet   
##  Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
##  1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
##  Median :103.0   Median :10.00   20     : 12   3:120  
##  Mean   :121.8   Mean   :10.72   10     : 12   4:118  
##  3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
##  Max.   :373.0   Max.   :21.00   19     : 12          
##                                  (Other):506
str(ChickWeight)
## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame':   578 obs. of  4 variables:
##  $ weight: num  42 51 59 64 76 93 106 125 149 171 ...
##  $ Time  : num  0 2 4 6 8 10 12 14 16 18 ...
##  $ Chick : Ord.factor w/ 50 levels "18"<"16"<"15"<..: 15 15 15 15 15 15 15 15 15 15 ...
##  $ Diet  : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "formula")=Class 'formula'  language weight ~ Time | Chick
##   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##  - attr(*, "outer")=Class 'formula'  language ~Diet
##   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##  - attr(*, "labels")=List of 2
##   ..$ x: chr "Time"
##   ..$ y: chr "Body weight"
##  - attr(*, "units")=List of 2
##   ..$ x: chr "(days)"
##   ..$ y: chr "(gm)"
# Linear model: time versus weight
lm(weight ~ Time, data = ChickWeight)
## 
## Call:
## lm(formula = weight ~ Time, data = ChickWeight)
## 
## Coefficients:
## (Intercept)         Time  
##      27.467        8.803
# store the model in an object
weight_mod <- lm(weight ~ Time, data = ChickWeight)

# summary of the model
summary(weight_mod)
## 
## Call:
## lm(formula = weight ~ Time, data = ChickWeight)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -138.331  -14.536    0.926   13.533  160.669 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  27.4674     3.0365   9.046   <2e-16 ***
## Time          8.8030     0.2397  36.725   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 38.91 on 576 degrees of freedom
## Multiple R-squared:  0.7007, Adjusted R-squared:  0.7002 
## F-statistic:  1349 on 1 and 576 DF,  p-value: < 2.2e-16
# Plot
plot(ChickWeight$Time, ChickWeight$weight, pch = 16)
abline(weight_mod, col = "red", lwd = 2) # add regression line on the plot

# Models with Diet as a co-variable
weight_mod2 <- lm(weight ~ Time + Diet, data = ChickWeight)
summary(weight_mod2)
## 
## Call:
## lm(formula = weight ~ Time + Diet, data = ChickWeight)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -136.851  -17.151   -2.595   15.033  141.816 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10.9244     3.3607   3.251  0.00122 ** 
## Time          8.7505     0.2218  39.451  < 2e-16 ***
## Diet2        16.1661     4.0858   3.957 8.56e-05 ***
## Diet3        36.4994     4.0858   8.933  < 2e-16 ***
## Diet4        30.2335     4.1075   7.361 6.39e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 35.99 on 573 degrees of freedom
## Multiple R-squared:  0.7453, Adjusted R-squared:  0.7435 
## F-statistic: 419.2 on 4 and 573 DF,  p-value: < 2.2e-16
# Model predictions
newtime <- seq(min(ChickWeight$Time), max(ChickWeight$Time), length.out = 100)
pred_1 <- predict(
  weight_mod2,
  newdata = data.frame(Time = newtime,
                       Diet = factor("1", levels = c("1", "2", "3", "4"))))
pred_2 <- predict(
  weight_mod2,
  newdata = data.frame(Time = newtime,
                       Diet = factor("2", levels = c("1", "2", "3", "4"))))
pred_3 <- predict(
  weight_mod2,
  newdata = data.frame(Time = newtime,
                       Diet = factor("3", levels = c("1", "2", "3", "4"))))
pred_4 <- predict(
  weight_mod2,
  newdata = data.frame(Time = newtime,
                       Diet = factor("4", levels = c("1", "2", "3", "4"))))


plot(ChickWeight$Time, ChickWeight$weight, col = ChickWeight$Diet, pch = 16,
     cex = 0.5)
lines(newtime, pred_1, col = 1, lwd = 2) # add prediction line for Diet1
lines(newtime, pred_2, col = 2, lwd = 2) # add prediction line for Diet2
lines(newtime, pred_3, col = 3, lwd = 2) # add prediction line for Diet3
lines(newtime, pred_4, col = 4, lwd = 2) # add prediction line for Diet4

For every linear model, we should inspect its residuals in order to meet if any assumption is violated.

# Check residuals of the model
par(mfrow = c(2, 2))
plot(weight_mod2, which = 1)
plot(weight_mod2, which = 2)
plot(weight_mod2, which = 3)
plot(weight_mod2, which = 4)

As the residuals show some trends, the model should be improved.

7. Interactive plots

Some R packages can produce interactive plots. We quickly illustrate how you can produce an interactive plots using both ggplot2 and plotly R packages. We here plot the number of individuals versus the species richness of the BCI plot data. Further details will be provided in the next practicals.

# install.packages() installs packages that you don't have yet on your computer
# install.packages("ggplot2"); install.packages("plotly")
# library() loads locally installed packages into your R console
library(ggplot2)
library(plotly)

# 1/0 matrix
BCI_bin <- BCI
BCI_bin[BCI_bin > 0] <- 1

pot <- data.frame(plot = colnames(BCI),
                  nb_ind = colSums(BCI),
                  rich = colSums(BCI_bin))

int_plot <- ggplot(pot, aes(nb_ind, rich)) +
  geom_point(aes(color = plot)) +
  scale_color_viridis_d("Plot") +
  labs(x = "Number of individuals", y = "Species richness") +
  theme_classic()

ggplotly(int_plot)

8. Resources

Back to Index

Index