Before starting
If you would like to see this tutorial in your viewer within RStudio,
please execute the following chunk of code:
install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson5.html",
destfile = file.path(dir, "lesson5.html"))
htmlFile <- file.path(dir, "lesson5.html")
rstudioapi::viewer(htmlFile)In this practical, we will go through the three main approaches to compute \(\beta\)-diversity.
Species turnover between 3 plots.
For this practical, we will use the same dataset we used for the previous practicals, i.e. the Barro Colorado Island (BCI) forest-plot inventory in Panama. Refer to the practical 2 for more details regarding this dataset.
BCI <- read.table("data/BCI_Data.csv",
header = TRUE, # does the file have column names
sep = ",", # what separates the column
row.names = 1) # does the file have row names
# Transpose the dataset to have plots as rows and species as columns
BCI <- t(BCI)
# Presence/absence matrix
BCI_bin <- BCI
BCI_bin[BCI_bin > 0] <- 1Lande’s additive partitionning. \[ \beta = \gamma - \overline{\alpha} \]
Calculate the Lande’s index on BCI data (Tips: you can use the functionspecnumber() included in the vegan package to
calculate the number of species per plot).
library(vegan)
alpha <- specnumber(BCI) # or use the binary site-species matrix
gamma <- ncol(BCI[, colSums(BCI) > 0])
gamma - mean(alpha)## [1] 134.22
Whittaker’s multiplicative partitionning. \[ \beta = \frac{\gamma}{\overline{\alpha}} \]
Calculate the Whittaker’s index.## [1] 2.478519
The slope of the species-area curve also gives us a measure of \(\beta\) diversity since it tells us how
many new species are added when we consider a bigger area.
Considering the BCI data, we know that each plot has a 1 hectare
surface. We can therefore reconstruct a species-surface curve using
random sample of a varying number of plots. At each random sample, we
calculate the average species number. The approach is similar to the
species accumulation curve we saw in the practical 4.
To do so, we will use a for-loop with the \(\alpha\) richness of random plot
aggregates.
As an example, let’s start with a random sample of two plots.
set.seed(1) # random seed
random_pair <- sample(names(alpha), 2) # we "randomly" pick two plots
random_pair## [1] "Plot4" "Plot39"
# Warning: some species occur in both plots so
# we cannot simply take the sum of the alpha diversity values
sum(alpha[random_pair]) # species richness if we do a simple sum## [1] 178
# we need to consider the number of unique species!
tmp <- BCI_bin[random_pair, ]
sum(colSums(tmp) > 0)## [1] 111
# now we repeat this 100 times to have a solid average estimate
mean_S <- replicate(100, {
tmp <- BCI_bin[sample(rownames(BCI_bin), 2), ]
ncol(tmp[, colSums(tmp) > 0])
})
# mean of the 100 replicates
length(mean_S)## [1] 100
## [1] 121.12
We now repeat this step with several aggregates of plot number (and therefore surface):
# and now a for-loop for several sample sizes
sample_size <- c(1:50)
mean_S <- c()
for(i in sample_size){
S <- replicate(100, {
tmp <- BCI_bin[sample(rownames(BCI_bin), i), ,drop=FALSE]
sum(colSums(tmp) > 0)})
mean_S[i] <- mean(S)
}
# data.frame with species richness and area
sp_area <- data.frame(area = sample_size,
S = mean_S)Now the plot.
plot(sp_area$area, sp_area$S, pch = 16, type = "b", cex = 0.5, lwd = 1,
main = "Species-area curve",
xlab = "Area (ha)", ylab = "Species richness")\(\beta\)-diversity estimate corresponds to the exponent \(z\) of the Arrhenius model \(S = cA^z\), where \(S\) is the species number and \(A\) is the area.
To retrieve it, we can calculate a linear model of the log-transformed number of species as a function of the log-transformed area, since we have: \[ log_{10}(S) = log_{10}(c) + zlog_{10}(A) \]
# Log-log scale
sp_area$log_area <- log(sp_area$area, base = 10)
sp_area$log_S <- log(sp_area$S, base = 10)
plot(sp_area$log_area, sp_area$log_S, pch = 16, type = "b", cex = 0.5,
lwd = 1, main = "Species-area curve",
xlab = "Log Area (ha)", ylab = "Log species richness")
# Extract linear coefficient
log_mod <- lm(log_S ~ log_area, data = sp_area)
log_mod##
## Call:
## lm(formula = log_S ~ log_area, data = sp_area)
##
## Coefficients:
## (Intercept) log_area
## 2.0547 0.1858
## log_area
## 0.186
The slope of 0.186 is the rate at which \(S\) increases against area and therefore our estimate of \(\beta\)-diversity.
Contrarily to the previous sections, we now evaluate \(\beta\)-diversity for each pair of sites.
Each pair of sites can share some species and have unique set of species, like in the following figure:
Example with two plots.
with \(a\) the set of species shared by both plots, \(b\) the species unique to plot 1 and \(c\) the species unique to plot 2.
The idea behind this concept is to find an index that identify the sites that share a lot of species and the ones that are distant, regarding their species composition.
Let us build a simple virtual matrix to visualize \(abc\) with simple plots.
virt <- matrix(c(1, 1, 1, 0, 0,
1, 1, 1, 0, 0,
1, 0, 0, 0, 0,
0, 0, 0, 1, 1,
0, 1, 1, 1, 0),
byrow = TRUE, nrow = 5, ncol = 5,
dimnames = list(c("plot1","plot2","plot3","plot4", "plot5"),
c("sp1", "sp2", "sp3", "sp4", "sp5")))
virt## sp1 sp2 sp3 sp4 sp5
## plot1 1 1 1 0 0
## plot2 1 1 1 0 0
## plot3 1 0 0 0 0
## plot4 0 0 0 1 1
## plot5 0 1 1 1 0
We can see that plots 1 and 2 are identical, plot 3 is nested within 1 and 2, plot 4 shares no species with the first three plots. Plot 5 displays an intermediate situation.
Let’s manually calculate the number of shared and unique species for a given pair of virtual plots.
Calculate the number of unique and shared species for the two virtual plots 1 and 5.# Subset of the binary matrix with just plots 1 and 2
virt_15 <- virt[c(1, 5), ]
# Number of species in each plot
rowSums(virt_15)## plot1 plot5
## 3 3
## [1] 2
## plot1 plot5
## 1 1
To compute automatically these numbers, we can use
betadiver() function from vegan package. With
method = NA, no index is calculated, but instead an object
of class betadiver is returned. This is a list of the
elements $a, $b and $c.
Each element is a distance matrix of size \(n\times(n-1)/2\) with \(n\) the number of plots.
We can also plot them in a triangular plot showing how each pair of plots is similar, as suggested by (Koleff, Gaston, and Lennon 2003).
## plot1 plot2 plot3 plot4
## plot2 3
## plot3 1 1
## plot4 0 0 0
## plot5 2 2 0 1
## [1] 10
## [1] 10
## plot1 plot2 plot3 plot4
## plot2 0
## plot3 0 0
## plot4 2 2 2
## plot5 1 1 3 2
## plot1 plot2 plot3 plot4
## plot2 0
## plot3 2 2
## plot4 3 3 1
## plot5 1 1 1 1
# Triangular plot
plot(betadiver(virt, method = NA), pch = 16, cex = 2)
legend("topleft", legend = "Ternary plot", bty = "n")If we draw the same triangular plot on BCI data, what do we get?
beta_abc <- betadiver(BCI, method = NA)
# a
as.matrix(beta_abc$a)[1:3, 1:3] #visualizing only the first 3 plots as an example## Plot1 Plot2 Plot3
## Plot1 0 64 64
## Plot2 64 0 62
## Plot3 64 62 0
## Plot1 Plot2 Plot3
## Plot1 0 20 26
## Plot2 20 0 28
## Plot3 26 28 0
# Plot
plot(betadiver(BCI, method = NA))
legend("topleft", legend = "Barro Colorado Island", bty = "n")Low heterogeneity is found among BCI communities, all plots are being clumped in one part of the triangular space. We could say that BCI plots are similarly dissimilar.
Many distances and metrics exist, the three following ones being the most important. With \(a\) the number of species shared by two sites, \(b\) the number of species only in the first site and \(c\) the number of species only in the second one, we have:
Therefore, \(\beta\)-diversity being a metric based on dissimilarity, we get:
\[ \beta_{sor} = 1- \frac{2a}{2a + b + c} \]
and similarly:
\[ \beta_{jac} = 1- \frac{a}{a + b + c} \]
\[ \beta_{sim} = \frac{min(b, c)}{a + min(b, c)} \]
Calculate the Sorensen similarity between the first two plots (Plot1 and Plot2) of the BCI dataset.## Plot1 Plot2
## 93 84
## [1] 64
## Plot1 Plot2
## 29 20
## [1] 0.7231638
betadiver() function from vegan allows us
to compute these indices.
# Sorensen similarity
sorBCI <- betadiver(BCI_bin, method = "sor")
as.matrix(sorBCI)[1:2, 1:2] # visualizing the first two plots as a matrix## Plot1 Plot2
## Plot1 0.0000000 0.7231638
## Plot2 0.7231638 0.0000000
## [1] 0.7231638
## Plot1 Plot2
## Plot1 0.0000000 0.2380952
## Plot2 0.2380952 0.0000000
## [1] 0.2380952
# nb. If i want to "convert" simpson to a similarity index to compare to the Sorensen one:
1- min(c(20, 29))/(64 + min(c(20, 29)))## [1] 0.7619048
## Plot1 Plot2
## Plot1 0.0000000 0.5663717
## Plot2 0.5663717 0.0000000
## [1] 0.5663717
# Plot the indices against each other
par(mfrow = c(1, 3))
plot(jBCI, sorBCI, xlab = "Jaccard", ylab= "Sorensen",
pch = 16, col = rgb(0, 0, 0, 0.1))
abline(c(0, 1))
plot(jBCI, simBCI, xlab = "Jaccard", ylab= "Simpson",
pch = 16, col = rgb(0, 0, 0, 0.1))
plot(sorBCI, simBCI, xlab = "Sorensen", ylab= "Simpson",
pch = 16, col = rgb(0, 0, 0, 0.1))
Note
The above betadiver() function computes the original
indices. These can quantify either similarity or dissimilarity between
plots. When classifying plots between each other, dissimilarity is often
used. Dissimilarity simply equals 1 - similarity.
vegdist() returns the dissimilarity between plots.
dissimBCI <- vegdist(BCI_bin, method = "bray") # using Bray-curtis method with presence/absense data
# equals to the Sorensen index (Re: lecture)!
as.matrix(dissimBCI)[1:3, 1:3]## Plot1 Plot2 Plot3
## Plot1 0.0000000 0.2768362 0.3005464
## Plot2 0.2768362 0.0000000 0.2873563
## Plot3 0.3005464 0.2873563 0.0000000
## Plot1 Plot2 Plot3
## Plot1 1.0000000 0.2768362 0.3005464
## Plot2 0.2768362 1.0000000 0.2873563
## Plot3 0.3005464 0.2873563 1.0000000
# Plot: betadiver versus vegdist
par(mfrow = c(1, 2))
plot(dissimBCI, sorBCI, xlab = "Sorensen dissimilarity (vegdist)",
ylab= "Sorensen similarity (betadiver)",
pch = 16, col = rgb(0, 0, 0, 0.1))
plot(dissimBCI, 1 - sorBCI,
xlab = "Sorensen dissimilarity (vegdist)",
ylab= "Sorensen dissimilarity (1 - betadiver)",
pch = 16, col = rgb(0, 0, 0, 0.1))
abline(c(0, 1))vegdist() can also be applied on abundances data, with
the use of Bray-Curtis index:
\[ \beta_{ij}=1-\frac{2C_{ij}}{S_i + S_j} \]
with \(C_{ij}\) the sum of the lesser individuals for only common species in both \(i\) and \(j\), \(S_i\) the number of individuals in plot \(i\) and \(S_j\) the number of individuals in plot \(j\).
# Presence/absence data
dissimBCI <- vegdist(BCI_bin, method = "bray")
# With abundance data
dissimBCI_ab <- vegdist(BCI, method = "bray")
as.matrix(dissimBCI_ab)[1:3, 1:3]## Plot1 Plot2 Plot3
## Plot1 0.0000000 0.2706682 0.3501647
## Plot2 0.2706682 0.0000000 0.2873051
## Plot3 0.3501647 0.2873051 0.0000000
# Plot: betadiver versus vegdist
par(mfrow = c(1, 1))
plot(dissimBCI, dissimBCI_ab, xlab = "Sorensen binary",
ylab= "Sorensen abundance", pch = 16, col = rgb(0, 0, 0, 0.1))
abline(c(0, 1))The overall pattern is similar but including abundances can lead to a better understanding of the changing of species composition across plots.
\(\beta\)-diversity can be divided into a nestedness and a turnover component. The nestedness component quantifies the part of \(\beta\)-diversity that results from the mere difference in species numbers. The turnover component quantifies the part that actually results from differences in species identities (Baselga, 2010). The Simpson index is a common measure of the turnover component because it is independent of differences in the number of species in the plots. Jaccard and Sorensen quantify the total beta diversity. Therefore:
\[ \beta_{nes} = \beta_{sor} - \beta_{sim} \]
or
\[ \beta_{nes} = \beta_{jac} - \beta_{sim} \] So we can calculate the nestedness component:
## Plot1 Plot2 Plot3
## Plot1 0.00000000 0.03874092 0.01165756
## Plot2 0.03874092 0.00000000 0.02545156
## Plot3 0.01165756 0.02545156 0.00000000
## [1] 0.3399075
## [1] 0.3090427
## [1] 0.03086482
df_boxplot <- data.frame(Dissimilarity = c(dissimBCI, simBCI, beta_nest),
Metric = rep(c("Beta", "Turnover", "Nestedness"),
each = length(dissimBCI)))
# Plot
par(mfrow = c(1, 2))
plot(simBCI, beta_nest, xlab = "Turnover", ylab = "Nestedness",
pch = 16, col = rgb(0, 0, 0, 0.1))
boxplot(Dissimilarity ~ Metric, df_boxplot, col = "grey")Note
Other package exist to calculate \(\beta\)-diversity like
betarpart. It notably allows to compute the three
components of \(\beta\)-diversity (with
turnover and nestedness) in a faster way.
library(betapart)
# watch out: beta.pair relies on dissimilarity and not similarity
# for example: the formula of beta_sorensen is (b+c)/(2a+b+c)
betaBCI <- beta.pair(BCI_bin, index.family = "sorensen")
str(betaBCI)
# Convert dist object into matrix
sorBCI <- as.matrix(betaBCI$beta.sor)
sorBCI[1:5, 1:5]
sorBCI[upper.tri(sorBCI)] <- NA
diag(sorBCI) <- NA
sorBCI[1:5, 1:5]From such a distance matrix, we can target the closest and most distances plots, regarding their species composition.
## [1] 0.4941176
## [1] 723
## [1] 0.7914439
## [1] 954
We can also highlight these extreme pairs in an “abc” triangular plot.
# Function to return dist index, given row i, column j and n number of plots
distdex <- function(i, j, n){
n*(i-1) - i*(i-1)/2 + j-i
}
# Index of most distant pair
dissim_pair <- distdex(18, 44, 50)
paste("a=", betadiver(BCI, method = NA)$a[dissim_pair], "; b=",
betadiver(BCI, method = NA)$b[dissim_pair], "; c=",
betadiver(BCI, method = NA)$c[dissim_pair])## [1] "a= 42 ; b= 39 ; c= 47"
# Index of closest pair
sim_pair <- distdex(27, 32, 50)
paste("a=", betadiver(BCI, method = NA)$a[sim_pair], "; b=",
betadiver(BCI, method = NA)$b[sim_pair], "; c=",
betadiver(BCI, method = NA)$c[sim_pair])## [1] "a= 74 ; b= 14 ; c= 25"
# Color vector
nb_pairs <- length(betadiver(BCI_bin, method = "sor")) # n*(n-1)/2
col_abc <- c(rep("black", (dissim_pair - 1)),
"dodgerblue",
rep("black", (sim_pair - dissim_pair - 1)),
"firebrick3",
rep("black", (nb_pairs - sim_pair)))
# Size vector
size_abc <- c(rep(0.5, (dissim_pair - 1)), 2,
rep(0.5, (sim_pair - dissim_pair - 1)), 2,
rep(0.5, (nb_pairs - sim_pair)))
# Plot
plot(betadiver(BCI, method = NA), pch = 16, cex = size_abc, col = col_abc)
legend(x = 0.6, y = 0.8,
legend = c("Most dissimilar", "Other pairs", "Most similar"),
col = c("dodgerblue", "black", "firebrick3"), pch = 16,
pt.cex = c(2, 0.5, 2), bty = "n")