In this practical, we will go through the main diversity indices used in ecology to compare plots, based on the abundance distribution of the species they host. For example, the two following plots have the same species richness and number of individuals, but the abundances of the three species is not distributed in the same way. How to rank their diversity?
How to compare these two plots?
Before starting
If you would like to see this practical in your viewer within RStudio, please execute the following chunk of code:
install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson4.html",
destfile = file.path(dir, "lesson4.html"))
htmlFile <- file.path(dir, "lesson4.html")
rstudioapi::viewer(htmlFile)We will use the same dataset for this practical than for the practical 2, i.e. the Barro Colorado Island (BCI) forest-plot inventory in Panama. Refer to the practical 2 for more details regarding this dataset.
Let us first import the site-species matrix from BCI data.
BCI <- read.table("data/BCI_Data.csv",
header = TRUE, # does the file have column names
sep = ",", # what separates the column
row.names = 1) # does the file have row names
BCI <- t(BCI) # transposing the matrix to have sites in rows
BCI[4:6, 5:7]## Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4 3 0 18
## Plot5 1 1 3
## Plot6 0 0 2
The matrix is filled with abundances. We here create a binary site-species matrix to access in an easier way the species richness of each plot.
BCI_bin <- BCI
BCI_bin[BCI_bin > 0] <- 1
BCI_bin[4:6, 5:7]## Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4 1 0 1
## Plot5 1 1 1
## Plot6 0 0 1
Now, from this presence/absence matrix, we can count how many different species each plot has. This is a first measure of diversity.
# Histogram
hist(rowSums(BCI_bin), col = "grey",
main = expression(paste(alpha, " richness")), # plot title
xlab = "Number of species") # x-axis titleHowever, as we saw in the lecture, species richness is limited to rank the diversity of several plots. We should account for the evenness of species in each plot.
Rank-abundance distribution (RAD) is a classic way to represent how the abundances are structured in a community. Note that the rank-abundance curves are also called species-abundance distribution in the literature (SAD, McGill et al. 2007).
The function rank() returns the rank of any feature and can therefore be used to rank the abundances of species within their communities.
rank(c(10, 2, 5))## [1] 3 1 2
# ascending order
rank(-c(10, 2, 5))## [1] 1 3 2
# tie-method
rank(c(1, 10, 1))## [1] 1.5 3.0 1.5
rank(c(1, 10, 1), ties.method = "random")## [1] 2 3 1
Let us take two plots (plot 35 and plot 7) with similar richness as examples to illustrate how their abundances can be shaped in a very different way.
# Extract the first community from the site-species table
ex_plot1 <- data.frame(sp = colnames(BCI),
ab = as.numeric(BCI[35, ]))
head(ex_plot1)## sp ab
## 1 Abarema.macradenium 0
## 2 Acacia.melanoceras 0
## 3 Acalypha.diversifolia 0
## 4 Acalypha.macrostachya 0
## 5 Adelia.triloba 6
## 6 Aegiphila.panamensis 0
# Removal of absent species
ex_plot1 <- ex_plot1[which(ex_plot1$ab != 0), ]
dim(ex_plot1)## [1] 83 2
# Add rank of species in the first community
# ?rank # help page of a given function
ex_plot1$rank <- rank(-ex_plot1$ab, ties.method = "random")
# Ordering data before plotting
ex_plot1 <- ex_plot1[order(ex_plot1$rank), ]
# Plot
plot(ex_plot1$rank, ex_plot1$ab, type = "b",
col = rgb(0.4, 0.4, 0.8, 0.6), pch = 16, lwd = 1,
main = "RAD",
xlab = "Rank", ylab = "Abundances")
# Let's add community 7 on this plot for comparison
ex_plot2 <- data.frame(sp = colnames(BCI),
ab = as.numeric(BCI[7, ]))
ex_plot2 <- ex_plot2[which(ex_plot2$ab != 0), ]
ex_plot2$rank <- rank(-ex_plot2$ab, ties.method = "random")
ex_plot2 <- ex_plot2[order(ex_plot2$rank), ]
points(ex_plot2$rank, ex_plot2$ab, pch = 16, type = "both",
col = rgb(1, 0.4, 0.2, 0.6))
legend(40, 250, c("35 (83 species)", "7 (82 species)"), title = "Plot",
text.font = 3, lwd = 2, seg.len = 1,
col = c(rgb(0.4, 0.4, 0.8, 0.6), rgb(1, 0.4, 0.2, 0.6)),
lty = 1, bg = "gray90")As for the majority of communities of living organisms in the world, the first BCI plot contains many rare species and just a few common species. The second one also follows this trend but in a less straightforward way as its abundances seem more even. This visual impression can also arise from histograms.
par(mfrow = c(1, 2))
hist(ex_plot1$ab, main = "Plot 35", xlab = "Abundances", col = "grey")
hist(ex_plot2$ab, main = "Plot 7", xlab = "Abundances", col = "grey")Rank-abundance distributions can also be visualized using relative abundances. Relative abundances are local abundances divided by the total number of individuals in a community, such as it sums to 1 in every community.
\[ \sum_{i=1}^{n}a_{i} = 1 \]
with \(n\) the number of species in a community and \(a_i\) the relative abundance of species \(i\).
Create a BCI matrix filled with relative abundances.
# Create matrix with relative abundances
BCI_rel <- BCI/rowSums(BCI)# Our three types of matrix
head(rowSums(BCI)); head(rowSums(BCI_bin)); head(rowSums(BCI_rel))## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6
## 448 435 463 508 505 412
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6
## 93 84 90 94 101 85
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6
## 1 1 1 1 1 1
# Add relative abundances
ex_plot1 <- data.frame(sp = colnames(BCI),
ab = as.numeric(BCI[35, ]))
ex_plot1$rank <- rank(-ex_plot1$ab, ties.method = "random")
ex_plot1 <- ex_plot1[which(ex_plot1$ab != 0), ]
ex_plot1$ab_rel <- as.numeric(BCI_rel[35, ][BCI_rel[35, ] > 0])
ex_plot1$rank_rel <- rank(-ex_plot1$ab_rel, ties.method = "random")
ex_plot1 <- ex_plot1[order(ex_plot1$rank_rel), ]
ex_plot2 <- data.frame(sp = colnames(BCI),
ab = as.numeric(BCI[7, ]))
ex_plot2$rank <- rank(-ex_plot2$ab, ties.method = "random")
ex_plot2 <- ex_plot2[which(ex_plot2$ab != 0), ]
ex_plot2$ab_rel <- as.numeric(BCI_rel[7, ][BCI_rel[7, ] > 0])
ex_plot2$rank_rel <- rank(-ex_plot2$ab_rel, ties.method = "random")
ex_plot2 <- ex_plot2[order(ex_plot2$rank_rel), ]
plot(ex_plot1$rank_rel, ex_plot1$ab_rel, type = "b",
col = rgb(0.4, 0.4, 0.8, 0.6), pch = 16, lwd = 1,
main = "RAD, relative scales",
xlab = "Rank",
ylab = "Relative abundances")
points(ex_plot2$rank_rel, ex_plot2$ab_rel, pch = 16, type = "both",
col = rgb(1, 0.4, 0.2, 0.6))
legend(40, 0.4, c("35 (83 species)", "7 (82 species)"), title = "Plot",
text.font = 3, lwd = 2, seg.len = 1,
col = c(rgb(0.4, 0.4, 0.8, 0.6), rgb(1, 0.4, 0.2, 0.6)),
lty = 1, bg = "gray90")In the next parts, we quantify the evenness of abundances of plots using two family of indices: the non-parametric and parametric indices.
Non-parametric indices do not rely on any statistical distribution. One of the most classical index in ecology is the Shannon index.
Shannon’s diversity H' is based on the proportion of individuals each species has in a community. For a species \(i\), its proportion of individuals \(p_i\) in a plot having \(N\) individuals writes as follow:
\[ p_i = \frac{n_i}{N} \]
From these proportions, the formula of Shannon is the following:
\[ H' = -\sum_{i=1}^Sp_ilog(p_i) \]
with \(S\) the species richness and \(p_i\) the proportion of individuals for species \(i\).
The idea behind Shannon’s index is that if these proportions are uneven across species, the Shannon’s index decreases.
Let’s take a simple virtual example to see how Shannon behaves. We here create a virtual site-species matrix with 4 communities and 4 species. In the first community, the four species have 10 individuals. In the other communities, the first species is more and more abundant, making the whole abundance distribution more and more uneven.
The function diversity() from vegan package enables us to calculate the index of Shannon.
virt <- matrix(10, nrow = 4, ncol = 4)
virt <- matrix(10, nrow = 4, ncol = 6,
dimnames = list(c("plot1", "plot2", "plot3", "plot4"),
c("sp1", "sp2", "sp3", "sp4", "sp5", "sp6")))
virt[, 1] <- c(10, 100, 1000, 10000)
library(vegan)
diversity(virt, index = "shannon")## plot1 plot2 plot3 plot4
## 1.79175947 1.17299347 0.26808398 0.03935448
Furthermore, the Shannon index increases if the number of species increases while the abundance distribution remains the same. For example, if we add a fifth community where not all species are present:
virt <- rbind(virt, plot5 = c(0, 0, 0, 10, 10, 10))
virt## sp1 sp2 sp3 sp4 sp5 sp6
## plot1 10 10 10 10 10 10
## plot2 100 10 10 10 10 10
## plot3 1000 10 10 10 10 10
## plot4 10000 10 10 10 10 10
## plot5 0 0 0 10 10 10
diversity(virt, index = "shannon")[c(1, 5)]## plot1 plot5
## 1.791759 1.098612
Let us now calculate this index on BCI data.
shan <- diversity(BCI, index = "shannon")
# Histogram
hist(shan, col = "grey", main = "Shannon diversity", xlab = "Value")One community has a very low Shannon index value, and therefore uneven distribution of abundances.
The Pielou’s index of evenness (or equitability) specifically quantifies the extent to which species abundances are unbalanced in the community. It ranges from 0 to 1 (abundances completely even), and decreases with the heterogeneity of abundances in one community.
Its formula is based on Shannon’s index:
\[ J' = \frac{H'}{H'_{max}} = \frac{H'}{log(S)} \] with \(S\) the species richness.
Using diversity() and specnumber() we can calculate it on BCI data:
pielou <- diversity(BCI, index = "shannon")/log(specnumber(BCI))
# Histogram
hist(pielou, col = "grey", main = "Pielou's evenness", xlab = "Value")The second most used index to measure evenness in ecology is probably the Simpson’s index. Like for Shannon, it is based on the proportion of individuals each species has in a community. Its formula is however different:
\[ D = 1 -\sum_{i=1}^Sp_i^2 \]
with \(p_i\) the proportion of individuals for species \(i\) and \(S\) the species richness.
We can calculate it on BCI data using the same diversity() function but with another argument.
simp <- diversity(BCI, index = "simpson")
invsimp <- diversity(BCI, "inv") # Inverse Simpson diversity
# Histogram
hist(simp, col = "grey", main = "Simpson diversity", xlab = "Value")The distribution of values also highlights a very uneven plot.
These indices are more or less stronger related to species richness. For example, here is the link between Shannon’s index and species richness in BCI:
plot(rowSums(BCI_bin), shan, pch = 16,
main = expression(paste(alpha, " richness versus Shannon diversity")),
xlab = "Species richness",
ylab = "Shannon")We can do the same for all the indices we computed and plot them. To do so, we can create a common data.frame with all the calculated indices.
Create a data.frame with species richness, Shannon, Simpson and Pielou indices.
# Create a data.frame with species richness and Shannon
shan <- data.frame(plot = names(shan),
shan = as.numeric(shan))
rich <- data.frame(plot = names(rowSums(BCI_bin)),
rich = as.numeric(rowSums(BCI_bin)))
all_div <- merge(rich, shan, by = "plot")
# Adding Simpson index
simp <- data.frame(plot = names(simp),
simp = as.numeric(simp))
all_div <- merge(all_div, simp, by = "plot")
# Adding Pielou index
pielou <- data.frame(plot = names(pielou),
pielou = as.numeric(pielou))
all_div <- merge(all_div, pielou, by = "plot")
dim(all_div)## [1] 50 5
head(all_div)## plot rich shan simp pielou
## 1 Plot1 93 4.018412 0.9746293 0.8865579
## 2 Plot10 94 3.889803 0.9663808 0.8561634
## 3 Plot11 87 3.859814 0.9658398 0.8642843
## 4 Plot12 84 3.698414 0.9550599 0.8347024
## 5 Plot13 93 3.982373 0.9692075 0.8786069
## 6 Plot14 98 4.017494 0.9718626 0.8762317
From this common table, we can plot a pair plot using psych package and showing the correlation between each index.
library(psych)
pairs.panels(all_div[, c("rich", "shan", "simp", "pielou")],
density = FALSE, ellipses = FALSE, hist.col = "grey")Again, one community is really uneven, especially when compared to the other communities. Which community is it?
# Which plot?
which.min(all_div$pielou)## [1] 29
As discussed in the lecture, the above diversity indices have some undesirable characteristics. For example, doubling the number of species with equal distribution of abundances does not necessarily lead to a doubling of the index values (non-linear properties). Furthermore, a community with \(S\) equally common species does not necessarily have a diversity of \(S\) and the unit of the index is not species, although both would be desirable. One solution to this problem is to convert the values into effective numbers of species, i.e. the number of equally common species that would be needed to reach a given index value.
The indices discussed can be transformed into effective numbers of species to meet the above characteristics.
In the case of Shannon \(H\), the transformation is achieved by doing \(e^H\).
In the case of Simpson \(D\), \(\frac{1}{1-D}\).
We will look at the properties of the indices using our artificially generated species-by-sites matrix:
virt## sp1 sp2 sp3 sp4 sp5 sp6
## plot1 10 10 10 10 10 10
## plot2 100 10 10 10 10 10
## plot3 1000 10 10 10 10 10
## plot4 10000 10 10 10 10 10
## plot5 0 0 0 10 10 10
diversity(virt, index = "shannon")## plot1 plot2 plot3 plot4 plot5
## 1.79175947 1.17299347 0.26808398 0.03935448 1.09861229
exp(diversity(virt, index = "shannon"))## plot1 plot2 plot3 plot4 plot5
## 6.000000 3.231652 1.307457 1.040139 3.000000
diversity(virt, index = "simpson")## plot1 plot2 plot3 plot4 plot5
## 0.833333333 0.533333333 0.092517007 0.009920547 0.666666667
1/(1-diversity(virt, index = "simpson"))## plot1 plot2 plot3 plot4 plot5
## 6.000000 2.142857 1.101949 1.010020 3.000000
The transformation leads to a situation in which the diversity in communities with constant abundances corresponds to the number of species and, accordingly, a community with twice the number of species has twice the diversity with the same abundance.
However, the values of the two indices differ for communities with non-constant abundances, as Simpson gives greater weight to abundances than Shannon. Hill numbers offers a generalisation of this weighting of abundances while retaining the desired characteristics. The calculation is done as
\[ D^q = \biggl(\sum_{i=1}^Sp_i^{q}\biggr)^\frac{1}{(1-q)} \]
where \(p_i\) is the relative abundance of the species \(i\) and \(S\) is the total number of species. The parameter \(q\) determines the sensitivity of the index to the abundances:
\(q = 0\) corresponds to the number of species,
\(q = 1\) corresponds to \(e^H\) (Shannon) and
\(q = 2\) corresponds to \(\frac{1}{1-D}\) (Simpson).
However, other values for \(q\) can also be selected. The function hill_taxa() from hillR package allows us to do so:
library(hillR)
hill_taxa(BCI, q = 0) # species richness## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 Plot7 Plot8 Plot9 Plot10 Plot11
## 93 84 90 94 101 85 82 88 90 94 87
## Plot12 Plot13 Plot14 Plot15 Plot16 Plot17 Plot18 Plot19 Plot20 Plot21 Plot22
## 84 93 98 93 93 93 89 109 100 99 91
## Plot23 Plot24 Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32 Plot33
## 99 95 105 91 99 85 86 97 77 88 86
## Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40 Plot41 Plot42 Plot43 Plot44
## 92 83 92 88 82 84 80 102 87 86 81
## Plot45 Plot46 Plot47 Plot48 Plot49 Plot50
## 81 86 102 91 91 93
identical(hill_taxa(BCI, q = 0), specnumber(BCI))## [1] TRUE
# With BCI
hill_BCI <- data.frame(q = seq(0, 3, by = 0.25),
Diversity35 = NA,
Diversity7 = NA)
for (i in 1:nrow(hill_BCI)){
hill_BCI[i, c("Diversity35", "Diversity7")] <- hill_taxa(BCI[c(35, 7),],
q = hill_BCI$q[i])
}
plot(hill_BCI$q, hill_BCI$Diversity7, type = "l",
xlab = "q", ylab = "Diversity",
col = rgb(1, 0.4, 0.2, 0.6), ylim = c(0, 90))
points(hill_BCI$q, hill_BCI$Diversity35, type = "l",
xlab = "q", ylab = "Diversity", col = rgb(0.4, 0.4, 0.8, 0.6))
legend(1.7, 85, c("Plot 35", "Plot 7"), title = "Plot",
text.font = 3, lwd = 2, seg.len = 1,
col = c(rgb(0.4, 0.4, 0.8, 0.6), rgb(1, 0.4, 0.2, 0.6)),
lty = 1)Plot 7 has a greater diversity than plot 35 at any \(q\), even if they have a comparable species richness.
The Fisher’s \(\alpha\) index is a parameter describing the standard log-series distribution that can be fitted to the observed pattern of relative abundances. It is widely used as a diversity measure and is one of the most recommended when the number of individuals in a plot exceeds 1000.
Note that other distributions can be fitted to an abundance distribution and that log-series is not the best one measures but we won’t do it in this practical (for more details, see ?radfit).
In Fisher’s logarithmic series, the expected number of species \(S\) with \(n\) observed individuals is the following:
\[ S_n = \alpha\frac{x^n}{n} \]
with \(S_n\) the number of species with \(n\) individuals, \(x\) a positive constant that you need to solve in an iterative way and \(\alpha\) an empirical constant determining the diversity index. Since \(x\) is empirically bounded between 0.5 and 1, is approximately the number of species with one individual.
fisherfit() function allows us to visualize abundance distributions in certain plots and the associated Fisher fitted lines. Plot the species abundance distribution of community 1, and the fitted log-series distribution
par(mfrow = c(1, 2))
plot(fisherfit(BCI[35, ]), main = "Plot 35", bar.col = "grey")
plot(fisherfit(BCI[7, ]), main = "Plot 7", bar.col = "grey")In this graphic, the height of the bar is the number of species that have a given abundance on abscissa. As seen before, we can see that the abundance distribution of the plots 35 and 7 differ quite a lot, the red line also shows very distinct profile.
We now first decompose how to extract the \(\alpha\) parameter for one plot. The first step is to solve the value of \(x\). To solve \(x\), an iterative procedure has to be done. Iteration means trying different values of \(x\) and stop when both terms of the following equation are equal.
\[ S/N = [\frac{1-x}{x}]\times[-log(1-x)] \] In the case of BCI plot 35, we get:
N35 <- rowSums(BCI)[35]
S35 <- rowSums(BCI_bin)[35]
S35/N35## Plot35
## 0.1381032
x <- 0.95
abs(S35/N35 - (((1-x)/x)*(-log(1-x))))## Plot35
## 0.01956696
x <- 0.958
abs(S35/N35 - (((1-x)/x)*(-log(1-x))))## Plot35
## 0.0008776296
x <- 0.9585
abs(S35/N35 - (((1-x)/x)*(-log(1-x)))) # let's stop here## Plot35
## 0.0003300087
Once \(x\) has been estimated, we can \(\alpha\) by solving the equation: \[ \alpha = \frac{N(1-x)}{x} \] which gives for plot 35:
N35 <- rowSums(BCI)[35]
S35 <- rowSums(BCI_bin)[35]
S35/N35## Plot35
## 0.1381032
alpha35 <- (N35*(1-x))/x
alpha35## Plot35
## 26.02139
Fortunately, we do not have to do all these tedious steps as there is already a ready-to-use function from vegan package. To extract parameter for every community, we can use the function fisher.alpha().
alpha <- fisher.alpha(BCI)
alpha## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 Plot7 Plot8
## 35.67297 30.99091 33.32033 33.92209 37.96423 32.49374 30.58383 33.44981
## Plot9 Plot10 Plot11 Plot12 Plot13 Plot14 Plot15 Plot16
## 35.67236 34.82320 34.20590 34.12041 37.56974 39.21837 35.07926 36.16927
## Plot17 Plot18 Plot19 Plot20 Plot21 Plot22 Plot23 Plot24
## 39.21057 38.71442 46.85170 40.99634 41.58728 35.84847 46.92764 39.87658
## Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32
## 43.53983 36.40222 41.03649 33.65451 35.54363 36.87417 27.62298 32.34465
## Plot33 Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40
## 32.08718 35.12348 26.11065 35.88775 33.28235 29.46136 31.41425 27.17138
## Plot41 Plot42 Plot43 Plot44 Plot45 Plot46 Plot47 Plot48
## 44.06386 33.59822 33.31374 30.28666 29.02041 32.32598 42.56011 35.99618
## Plot49 Plot50
## 35.41942 36.40426
alpha35; alpha[35]## Plot35
## 26.02139
## Plot35
## 26.11065
alpha[7]## Plot7
## 30.58383
We use the same pair plot as before.
# Add Fisher's alpha to all_div data.frame
alpha <- data.frame(plot = names(alpha),
alpha = as.numeric(alpha))
all_div <- merge(all_div, alpha, by = "plot")
# pair plot
pairs.panels(all_div[, c("rich", "shan", "simp", "pielou", "alpha")],
density = FALSE, ellipses = FALSE, hist.col = "grey")Jost, L. (2006) The New Synthesis of Diversity Indices and Similarity
Magurran, A.E. (2013) Measuring biological diversity. John Wiley & Sons.