Before starting
If you would like to see this tutorial in your viewer within RStudio, please execute the following chunk of code:

install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
              destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson5.html",
              destfile = file.path(dir, "lesson5.html"))
htmlFile <- file.path(dir, "lesson5.html")
rstudioapi::viewer(htmlFile)

In this practical, we will go through the three main approaches to compute $\beta$-diversity.

$\alpha$ diversity versus $\gamma$ diversity
slope of the species-surface curve
similarity indices

Species turnover between 3 plots.

For this practical, we will use the same dataset we used for the previous practicals, i.e. the Barro Colorado Island (BCI) forest-plot inventory in Panama. Refer to the practical 2 for more details regarding this dataset.

BCI <- read.table("data/BCI_Data.csv",
                  header = TRUE, # does the file have column names
                  sep = ",", # what separates the column
                  row.names = 1) # does the file have row names

# Transpose the dataset to have plots as rows and species as columns
BCI <- t(BCI)

# Presence/absence matrix
BCI_bin <- BCI
BCI_bin[BCI_bin > 0] <- 1

1. Pool level

1.1. Alpha diversity versus gamma diversity

Lande’s additive partitionning. \[ \beta = \gamma - \overline{\alpha} \]

Calculate the Lande’s index on BCI data (Tips: you can use the function specnumber() included in the vegan package to calculate the number of species per plot).

Solution

library(vegan)
alpha <- specnumber(BCI) # or use the binary site-species matrix
gamma <- ncol(BCI[, colSums(BCI) > 0])

gamma - mean(alpha)

## [1] 134.22

Whittaker’s multiplicative partitionning. \[ \beta = \frac{\gamma}{\overline{\alpha}} \]

Calculate the Whittaker’s index.

Solution

alpha <- specnumber(BCI)
gamma <- ncol(BCI[, colSums(BCI) > 0])

gamma/mean(alpha)

## [1] 2.478519

1.2. Species-surface curve

The slope of the species-area curve also gives us a measure of $\beta$ diversity since it tells us how many new species are added when we consider a bigger area.
Considering the BCI data, we know that each plot has a 1 hectare surface. We can therefore reconstruct a species-surface curve using random sample of a varying number of plots. At each random sample, we calculate the average species number. The approach is similar to the species accumulation curve we saw in the practical 4.

To do so, we will use a for-loop with the $\alpha$ richness of random plot aggregates.

As an example, let’s start with a random sample of two plots.

set.seed(1) # random seed
random_pair <- sample(names(alpha), 2) # we "randomly" pick two plots
random_pair

## [1] "Plot4"  "Plot39"

# Warning: some species occur in both plots so
# we cannot simply take the sum of the alpha diversity values
sum(alpha[random_pair]) # species richness if we do a simple sum

## [1] 178

# we need to consider the number of unique species!
tmp <- BCI_bin[random_pair, ] 
sum(colSums(tmp) > 0)

## [1] 111

# now we repeat this 100 times to have a solid average estimate
mean_S <- replicate(100, {
  tmp <- BCI_bin[sample(rownames(BCI_bin), 2), ]
  ncol(tmp[, colSums(tmp) > 0])
})
# mean of the 100 replicates
length(mean_S)

## [1] 100

mean(mean_S)

## [1] 121.12

We now repeat this step with several aggregates of plot number (and therefore surface):

# and now a for-loop for several sample sizes
sample_size <- c(1:50)
mean_S <- c()
for(i in sample_size){
  S <- replicate(100, {
    tmp <- BCI_bin[sample(rownames(BCI_bin), i), ,drop=FALSE]
    sum(colSums(tmp) > 0)})
  mean_S[i] <- mean(S)
}

# data.frame with species richness and area
sp_area <- data.frame(area = sample_size,
                      S = mean_S)

Now the plot.

plot(sp_area$area, sp_area$S, pch = 16, type = "b", cex = 0.5, lwd = 1,
     main = "Species-area curve",
     xlab = "Area (ha)", ylab = "Species richness")

$\beta$-diversity estimate corresponds to the exponent $z$ of the Arrhenius model $S = cA^z$, where $S$ is the species number and $A$ is the area.

To retrieve it, we can calculate a linear model of the log-transformed number of species as a function of the log-transformed area, since we have: \[ log_{10}(S) = log_{10}(c) + zlog_{10}(A) \]

# Log-log scale
sp_area$log_area <- log(sp_area$area, base = 10)
sp_area$log_S <- log(sp_area$S, base = 10)

plot(sp_area$log_area, sp_area$log_S, pch = 16, type = "b", cex = 0.5,
     lwd = 1, main = "Species-area curve",
     xlab = "Log Area (ha)", ylab = "Log species richness")

# Extract linear coefficient
log_mod <- lm(log_S ~ log_area, data = sp_area)
log_mod

## 
## Call:
## lm(formula = log_S ~ log_area, data = sp_area)
## 
## Coefficients:
## (Intercept)     log_area  
##      2.0547       0.1858

round(log_mod$coefficients[2],3)

## log_area 
##    0.186

abline(log_mod, col = "firebrick3", lwd = 2)

The slope of 0.186 is the rate at which $S$ increases against area and therefore our estimate of $\beta$-diversity.

2. Pairwise level: similarity indices

Contrarily to the previous sections, we now evaluate $\beta$-diversity for each pair of sites.

Each pair of sites can share some species and have unique set of species, like in the following figure:

Example with two plots.

with $a$ the set of species shared by both plots, $b$ the species unique to plot 1 and $c$ the species unique to plot 2.

The idea behind this concept is to find an index that identify the sites that share a lot of species and the ones that are distant, regarding their species composition.

2.1. Plot of abc

Let us build a simple virtual matrix to visualize $abc$ with simple plots.

virt <- matrix(c(1, 1, 1, 0, 0,
                 1, 1, 1, 0, 0,
                 1, 0, 0, 0, 0,
                 0, 0, 0, 1, 1,
                 0, 1, 1, 1, 0),
               byrow = TRUE, nrow = 5, ncol = 5,
               dimnames = list(c("plot1","plot2","plot3","plot4", "plot5"),
                               c("sp1", "sp2", "sp3", "sp4", "sp5")))
virt

##       sp1 sp2 sp3 sp4 sp5
## plot1   1   1   1   0   0
## plot2   1   1   1   0   0
## plot3   1   0   0   0   0
## plot4   0   0   0   1   1
## plot5   0   1   1   1   0

We can see that plots 1 and 2 are identical, plot 3 is nested within 1 and 2, plot 4 shares no species with the first three plots. Plot 5 displays an intermediate situation.

Let’s manually calculate the number of shared and unique species for a given pair of virtual plots.

Calculate the number of unique and shared species for the two virtual plots 1 and 5.

Solution

# Subset of the binary matrix with just plots 1 and 2
virt_15 <- virt[c(1, 5), ]

# Number of species in each plot
rowSums(virt_15)

## plot1 plot5 
##     3     3

# Number of species in both plots
ncol(virt_15[, colSums(virt_15) == 2])

## [1] 2

# Number of unique species in each plot
rowSums(virt_15) - ncol(virt_15[, colSums(virt_15) == 2])

## plot1 plot5 
##     1     1

To compute automatically these numbers, we can use betadiver() function from vegan package. With method = NA, no index is calculated, but instead an object of class betadiver is returned. This is a list of the elements $a, $b and $c.

Each element is a distance matrix of size $n\times(n-1)/2$ with $n$ the number of plots.

We can also plot them in a triangular plot showing how each pair of plots is similar, as suggested by (Koleff, Gaston, and Lennon 2003).

beta_virt <- betadiver(virt, method = NA)
# a
beta_virt$a

##       plot1 plot2 plot3 plot4
## plot2     3                  
## plot3     1     1            
## plot4     0     0     0      
## plot5     2     2     0     1

length(beta_virt$a)

## [1] 10

5*4/2

## [1] 10

# b
beta_virt$b

##       plot1 plot2 plot3 plot4
## plot2     0                  
## plot3     0     0            
## plot4     2     2     2      
## plot5     1     1     3     2

# c
beta_virt$c

##       plot1 plot2 plot3 plot4
## plot2     0                  
## plot3     2     2            
## plot4     3     3     1      
## plot5     1     1     1     1

# Triangular plot
plot(betadiver(virt, method = NA), pch = 16, cex = 2)
legend("topleft", legend = "Ternary plot", bty = "n")

If we draw the same triangular plot on BCI data, what do we get?

beta_abc <- betadiver(BCI, method = NA)
# a
as.matrix(beta_abc$a)[1:3, 1:3] #visualizing only the first 3 plots as an example

##       Plot1 Plot2 Plot3
## Plot1     0    64    64
## Plot2    64     0    62
## Plot3    64    62     0

# b
as.matrix(beta_abc$b)[1:3, 1:3]

##       Plot1 Plot2 Plot3
## Plot1     0    20    26
## Plot2    20     0    28
## Plot3    26    28     0

# Plot
plot(betadiver(BCI, method = NA))
legend("topleft", legend = "Barro Colorado Island", bty = "n")

Low heterogeneity is found among BCI communities, all plots are being clumped in one part of the triangular space. We could say that BCI plots are similarly dissimilar.

2.2. Pairwise distances

Many distances and metrics exist, the three following ones being the most important. With $a$ the number of species shared by two sites, $b$ the number of species only in the first site and $c$ the number of species only in the second one, we have:

Sorensen (similarity) \[ Sorensen = \frac{2a}{2a + b + c} \]

Therefore, $\beta$-diversity being a metric based on dissimilarity, we get:

\[ \beta_{sor} = 1- \frac{2a}{2a + b + c} \]

Jaccard (similarity) \[ Jaccard = \frac{a}{a + b + c} \]

and similarly:

\[ \beta_{jac} = 1- \frac{a}{a + b + c} \]

Simpson (dissimilarity)

\[ \beta_{sim} = \frac{min(b, c)}{a + min(b, c)} \]

Calculate the Sorensen similarity between the first two plots (Plot1 and Plot2) of the BCI dataset.

Solution

BCI_12 <- BCI_bin[1:2, ]
# Number of species in each plot
rowSums(BCI_12)

## Plot1 Plot2 
##    93    84

# Number of species in both plots
ncol(BCI_12[, colSums(BCI_12)==2])

## [1] 64

# Number of unique species in each plot
rowSums(BCI_12) - ncol(BCI_12[, colSums(BCI_12)==2])

## Plot1 Plot2 
##    29    20

# Sorensen similarity index
(2*64)/(2*64+20+29)

## [1] 0.7231638

betadiver() function from vegan allows us to compute these indices.

# Sorensen similarity
sorBCI <- betadiver(BCI_bin, method = "sor")
as.matrix(sorBCI)[1:2, 1:2] # visualizing the first two plots as a matrix

##           Plot1     Plot2
## Plot1 0.0000000 0.7231638
## Plot2 0.7231638 0.0000000

2*64/(2*64 + 20 + 29)

## [1] 0.7231638

# Simpson dissimilarity
simBCI <- betadiver(BCI_bin, method = "sim")
as.matrix(simBCI)[1:2, 1:2]

##           Plot1     Plot2
## Plot1 0.0000000 0.2380952
## Plot2 0.2380952 0.0000000

min(c(20, 29))/(64 + min(c(20, 29)))

## [1] 0.2380952

# nb. If i want to "convert" simpson to a similarity index to compare to the Sorensen one: 
1- min(c(20, 29))/(64 + min(c(20, 29)))

## [1] 0.7619048

# Jaccard similarity
jBCI <- betadiver(BCI_bin, method = "j")
as.matrix(jBCI)[1:2, 1:2]

##           Plot1     Plot2
## Plot1 0.0000000 0.5663717
## Plot2 0.5663717 0.0000000

64/(64 + 20 + 29)

## [1] 0.5663717

# Plot the indices against each other
par(mfrow = c(1, 3))
plot(jBCI, sorBCI, xlab = "Jaccard", ylab= "Sorensen",
     pch = 16, col = rgb(0, 0, 0, 0.1))
abline(c(0, 1))
plot(jBCI, simBCI, xlab = "Jaccard", ylab= "Simpson",
     pch = 16, col = rgb(0, 0, 0, 0.1))
plot(sorBCI, simBCI, xlab = "Sorensen", ylab= "Simpson",
     pch = 16, col = rgb(0, 0, 0, 0.1))

Note
The above betadiver() function computes the original indices. These can quantify either similarity or dissimilarity between plots. When classifying plots between each other, dissimilarity is often used. Dissimilarity simply equals 1 - similarity.

vegdist() returns the dissimilarity between plots.

dissimBCI <- vegdist(BCI_bin, method = "bray") # using Bray-curtis method with presence/absense data
# equals to the Sorensen index (Re: lecture)!
as.matrix(dissimBCI)[1:3, 1:3]

##           Plot1     Plot2     Plot3
## Plot1 0.0000000 0.2768362 0.3005464
## Plot2 0.2768362 0.0000000 0.2873563
## Plot3 0.3005464 0.2873563 0.0000000

1 - as.matrix(sorBCI)[1:3, 1:3]

##           Plot1     Plot2     Plot3
## Plot1 1.0000000 0.2768362 0.3005464
## Plot2 0.2768362 1.0000000 0.2873563
## Plot3 0.3005464 0.2873563 1.0000000

# Plot: betadiver versus vegdist
par(mfrow = c(1, 2))
plot(dissimBCI, sorBCI, xlab = "Sorensen dissimilarity (vegdist)",
     ylab= "Sorensen similarity (betadiver)",
     pch = 16, col = rgb(0, 0, 0, 0.1))
plot(dissimBCI, 1 - sorBCI,
     xlab = "Sorensen dissimilarity (vegdist)",
     ylab= "Sorensen dissimilarity (1 - betadiver)",
     pch = 16, col = rgb(0, 0, 0, 0.1))
abline(c(0, 1))

vegdist() can also be applied on abundances data, with the use of Bray-Curtis index:

\[ \beta_{ij}=1-\frac{2C_{ij}}{S_i + S_j} \]

with $C_{ij}$ the sum of the lesser individuals for only common species in both $i$ and $j$, $S_i$ the number of individuals in plot $i$ and $S_j$ the number of individuals in plot $j$.

# Presence/absence data
dissimBCI <- vegdist(BCI_bin, method = "bray")
# With abundance data
dissimBCI_ab <- vegdist(BCI, method = "bray")
as.matrix(dissimBCI_ab)[1:3, 1:3]

##           Plot1     Plot2     Plot3
## Plot1 0.0000000 0.2706682 0.3501647
## Plot2 0.2706682 0.0000000 0.2873051
## Plot3 0.3501647 0.2873051 0.0000000

# Plot: betadiver versus vegdist
par(mfrow = c(1, 1))
plot(dissimBCI, dissimBCI_ab, xlab = "Sorensen binary",
     ylab= "Sorensen abundance", pch = 16, col = rgb(0, 0, 0, 0.1))
abline(c(0, 1))

The overall pattern is similar but including abundances can lead to a better understanding of the changing of species composition across plots.

2.3. Partitionning of $\beta$ diversity

$\beta$-diversity can be divided into a nestedness and a turnover component. The nestedness component quantifies the part of $\beta$-diversity that results from the mere difference in species numbers. The turnover component quantifies the part that actually results from differences in species identities (Baselga, 2010). The Simpson index is a common measure of the turnover component because it is independent of differences in the number of species in the plots. Jaccard and Sorensen quantify the total beta diversity. Therefore:

\[ \beta_{nes} = \beta_{sor} - \beta_{sim} \]

\[ \beta_{nes} = \beta_{jac} - \beta_{sim} \] So we can calculate the nestedness component:

beta_nest <- dissimBCI - simBCI
as.matrix(beta_nest)[1:3, 1:3]

##            Plot1      Plot2      Plot3
## Plot1 0.00000000 0.03874092 0.01165756
## Plot2 0.03874092 0.00000000 0.02545156
## Plot3 0.01165756 0.02545156 0.00000000

# Mean beta-diversity (here with Sorensen index)
mean(dissimBCI)

## [1] 0.3399075

# Mean turnover component
mean(simBCI)

## [1] 0.3090427

# Mean nestedness component
mean(beta_nest)

## [1] 0.03086482

df_boxplot <- data.frame(Dissimilarity = c(dissimBCI, simBCI, beta_nest),
                         Metric = rep(c("Beta", "Turnover", "Nestedness"),
                                      each = length(dissimBCI)))

# Plot
par(mfrow = c(1, 2))
plot(simBCI, beta_nest, xlab = "Turnover", ylab = "Nestedness",
     pch = 16, col = rgb(0, 0, 0, 0.1))
boxplot(Dissimilarity ~ Metric, df_boxplot, col = "grey")

Note
Other package exist to calculate $\beta$-diversity like betarpart. It notably allows to compute the three components of $\beta$-diversity (with turnover and nestedness) in a faster way.

library(betapart)
# watch out: beta.pair relies on dissimilarity and not similarity
# for example: the formula of beta_sorensen is (b+c)/(2a+b+c)
betaBCI <- beta.pair(BCI_bin, index.family = "sorensen")
str(betaBCI)

# Convert dist object into matrix
sorBCI <- as.matrix(betaBCI$beta.sor)
sorBCI[1:5, 1:5]
sorBCI[upper.tri(sorBCI)] <- NA
diag(sorBCI) <- NA
sorBCI[1:5, 1:5]

Bonus: highliting extreme points on abc plots

Bonus

From such a distance matrix, we can target the closest and most distances plots, regarding their species composition.

# What is the minimal distance between two plots?
min(sorBCI, na.rm = TRUE)

## [1] 0.4941176

which(sorBCI == min(sorBCI, na.rm = TRUE), arr.ind = TRUE)

## [1] 723

# Plots 27 and 32

# What is the maximal distance between two plots?
max(sorBCI, na.rm = TRUE)

## [1] 0.7914439

which(sorBCI == max(sorBCI, na.rm = TRUE), arr.ind = TRUE)

## [1] 954

# Plots 44 and 18

We can also highlight these extreme pairs in an “abc” triangular plot.

# Function to return dist index, given row i, column j and n number of plots
distdex <- function(i, j, n){
  n*(i-1) - i*(i-1)/2 + j-i
}

# Index of most distant pair
dissim_pair <- distdex(18, 44, 50)
paste("a=", betadiver(BCI, method = NA)$a[dissim_pair], "; b=",
      betadiver(BCI, method = NA)$b[dissim_pair], "; c=",
      betadiver(BCI, method = NA)$c[dissim_pair])

## [1] "a= 42 ; b= 39 ; c= 47"

# Index of closest pair
sim_pair <- distdex(27, 32, 50)
paste("a=", betadiver(BCI, method = NA)$a[sim_pair], "; b=",
      betadiver(BCI, method = NA)$b[sim_pair], "; c=",
      betadiver(BCI, method = NA)$c[sim_pair])

## [1] "a= 74 ; b= 14 ; c= 25"

# Color vector
nb_pairs <- length(betadiver(BCI_bin, method = "sor")) # n*(n-1)/2
col_abc <- c(rep("black", (dissim_pair - 1)),
             "dodgerblue",
             rep("black", (sim_pair - dissim_pair - 1)),
             "firebrick3",
             rep("black", (nb_pairs - sim_pair)))

# Size vector
size_abc <- c(rep(0.5, (dissim_pair - 1)), 2,
              rep(0.5, (sim_pair - dissim_pair - 1)), 2,
              rep(0.5, (nb_pairs - sim_pair)))

# Plot
plot(betadiver(BCI, method = NA), pch = 16, cex = size_abc, col = col_abc)
legend(x = 0.6, y = 0.8,
       legend = c("Most dissimilar", "Other pairs", "Most similar"),
       col = c("dodgerblue", "black", "firebrick3"), pch = 16,
       pt.cex = c(2, 0.5, 2), bty = "n")