In this practical, we will go through the main diversity indices used in ecology to compare plots, based on the abundance distribution of the species they host. For example, the two following plots have the same species richness and number of individuals, but the abundances of the three species is not distributed in the same way. How to rank their diversity?


How to compare these two plots?

How to compare these two plots?


Before starting
If you would like to see this practical in your viewer within RStudio, please execute the following chunk of code:

install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
              destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson4.html",
              destfile = file.path(dir, "lesson4.html"))
htmlFile <- file.path(dir, "lesson4.html")
rstudioapi::viewer(htmlFile)

We will use the same dataset for this practical than for the practical 2, i.e. the Barro Colorado Island (BCI) forest-plot inventory in Panama. Refer to the practical 2 for more details regarding this dataset.

1. Species richness and rank-abundance curves

1.1. Species richness

Let us first import the site-species matrix from BCI data.

BCI <- read.table("data/BCI_Data.csv",
                  header = TRUE, # does the file have column names
                  sep = ",", # what separates the column
                  row.names = 1) # does the file have row names
BCI <- t(BCI) # transposing the matrix to have sites in rows

BCI[4:6, 5:7]
##       Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4              3                    0                      18
## Plot5              1                    1                       3
## Plot6              0                    0                       2

The matrix is filled with abundances. We here create a binary site-species matrix to access in an easier way the species richness of each plot.

BCI_bin <- BCI
BCI_bin[BCI_bin > 0] <- 1
BCI_bin[4:6, 5:7]
##       Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4              1                    0                       1
## Plot5              1                    1                       1
## Plot6              0                    0                       1

Now, from this presence/absence matrix, we can count how many different species each plot has. This is a first measure of diversity.

# Histogram
hist(rowSums(BCI_bin), col = "grey",
     main = expression(paste(alpha, " richness")), # plot title
     xlab = "Number of species") # x-axis title

However, as we saw in the lecture, species richness is limited to rank the diversity of several plots. We should account for the evenness of species in each plot.

1.2. Rank-abundance curves

Rank-abundance distribution (RAD) is a classic way to represent how the abundances are structured in a community. Note that the rank-abundance curves are also called species-abundance distribution in the literature (SAD, McGill et al. 2007).

The function rank() returns the rank of any feature and can therefore be used to rank the abundances of species within their communities.

rank(c(10, 2, 5))
## [1] 3 1 2
# ascending order
rank(-c(10, 2, 5))
## [1] 1 3 2
# tie-method
rank(c(1, 10, 1))
## [1] 1.5 3.0 1.5
rank(c(1, 10, 1), ties.method = "random")
## [1] 2 3 1

Let us take two plots (plot 35 and plot 7) with similar richness as examples to illustrate how their abundances can be shaped in a very different way.

# Extract the first community from the site-species table
ex_plot1 <- data.frame(sp = colnames(BCI),
                       ab = as.numeric(BCI[35, ]))
head(ex_plot1)
##                      sp ab
## 1   Abarema.macradenium  0
## 2    Acacia.melanoceras  0
## 3 Acalypha.diversifolia  0
## 4 Acalypha.macrostachya  0
## 5        Adelia.triloba  6
## 6  Aegiphila.panamensis  0
# Removal of absent species
ex_plot1 <- ex_plot1[which(ex_plot1$ab != 0), ]
dim(ex_plot1)
## [1] 83  2
# Add rank of species in the first community
# ?rank # help page of a given function
ex_plot1$rank <- rank(-ex_plot1$ab, ties.method = "random")

# Ordering data before plotting
ex_plot1 <- ex_plot1[order(ex_plot1$rank), ]

# Plot
plot(ex_plot1$rank, ex_plot1$ab, type = "b",
     col = rgb(0.4, 0.4, 0.8, 0.6), pch = 16, lwd = 1,
     main = "RAD",
     xlab = "Rank", ylab = "Abundances")

# Let's add community 7 on this plot for comparison
ex_plot2 <- data.frame(sp = colnames(BCI),
                       ab = as.numeric(BCI[7, ]))
ex_plot2 <- ex_plot2[which(ex_plot2$ab != 0), ]
ex_plot2$rank <- rank(-ex_plot2$ab, ties.method = "random")
ex_plot2 <- ex_plot2[order(ex_plot2$rank), ]

points(ex_plot2$rank, ex_plot2$ab, pch = 16, type = "both",
       col = rgb(1, 0.4, 0.2, 0.6))
legend(40, 250, c("35 (83 species)", "7 (82 species)"), title = "Plot",
       text.font = 3, lwd = 2, seg.len = 1,
       col = c(rgb(0.4, 0.4, 0.8, 0.6), rgb(1, 0.4, 0.2, 0.6)),
       lty = 1, bg = "gray90")

As for the majority of communities of living organisms in the world, the first BCI plot contains many rare species and just a few common species. The second one also follows this trend but in a less straightforward way as its abundances seem more even. This visual impression can also arise from histograms.

par(mfrow = c(1, 2))
hist(ex_plot1$ab, main = "Plot 35", xlab = "Abundances", col = "grey")
hist(ex_plot2$ab, main = "Plot 7", xlab = "Abundances", col = "grey")

Rank-abundance distributions can also be visualized using relative abundances. Relative abundances are local abundances divided by the total number of individuals in a community, such as it sums to 1 in every community.

\[ \sum_{i=1}^{n}a_{i} = 1 \]

with \(n\) the number of species in a community and \(a_i\) the relative abundance of species \(i\).

Create a BCI matrix filled with relative abundances.

Solution
# Create matrix with relative abundances
BCI_rel <- BCI/rowSums(BCI)


# Our three types of matrix
head(rowSums(BCI)); head(rowSums(BCI_bin)); head(rowSums(BCI_rel))
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 
##   448   435   463   508   505   412
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 
##    93    84    90    94   101    85
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 
##     1     1     1     1     1     1
# Add relative abundances
ex_plot1 <- data.frame(sp = colnames(BCI),
                       ab = as.numeric(BCI[35, ]))
ex_plot1$rank <- rank(-ex_plot1$ab, ties.method = "random")
ex_plot1 <- ex_plot1[which(ex_plot1$ab != 0), ]
ex_plot1$ab_rel <- as.numeric(BCI_rel[35, ][BCI_rel[35, ] > 0])
ex_plot1$rank_rel <- rank(-ex_plot1$ab_rel, ties.method = "random")
ex_plot1 <- ex_plot1[order(ex_plot1$rank_rel), ]

ex_plot2 <- data.frame(sp = colnames(BCI),
                       ab = as.numeric(BCI[7, ]))
ex_plot2$rank <- rank(-ex_plot2$ab, ties.method = "random")
ex_plot2 <- ex_plot2[which(ex_plot2$ab != 0), ]
ex_plot2$ab_rel <- as.numeric(BCI_rel[7, ][BCI_rel[7, ] > 0])
ex_plot2$rank_rel <- rank(-ex_plot2$ab_rel, ties.method = "random")
ex_plot2 <- ex_plot2[order(ex_plot2$rank_rel), ]

plot(ex_plot1$rank_rel, ex_plot1$ab_rel, type = "b",
     col = rgb(0.4, 0.4, 0.8, 0.6), pch = 16, lwd = 1,
     main = "RAD, relative scales",
     xlab = "Rank",
     ylab = "Relative abundances")
points(ex_plot2$rank_rel, ex_plot2$ab_rel, pch = 16, type = "both",
       col = rgb(1, 0.4, 0.2, 0.6))
legend(40, 0.4, c("35 (83 species)", "7 (82 species)"), title = "Plot",
       text.font = 3, lwd = 2, seg.len = 1,
       col = c(rgb(0.4, 0.4, 0.8, 0.6), rgb(1, 0.4, 0.2, 0.6)),
       lty = 1, bg = "gray90")

In the next parts, we quantify the evenness of abundances of plots using two family of indices: the non-parametric and parametric indices.

2. Non-parametric indices

Non-parametric indices do not rely on any statistical distribution. One of the most classical index in ecology is the Shannon index.

2.1. Shannon

Shannon’s diversity H' is based on the proportion of individuals each species has in a community. For a species \(i\), its proportion of individuals \(p_i\) in a plot having \(N\) individuals writes as follow:

\[ p_i = \frac{n_i}{N} \]

From these proportions, the formula of Shannon is the following:

\[ H' = -\sum_{i=1}^Sp_ilog(p_i) \]

with \(S\) the species richness and \(p_i\) the proportion of individuals for species \(i\).

The idea behind Shannon’s index is that if these proportions are uneven across species, the Shannon’s index decreases.

Let’s take a simple virtual example to see how Shannon behaves. We here create a virtual site-species matrix with 4 communities and 4 species. In the first community, the four species have 10 individuals. In the other communities, the first species is more and more abundant, making the whole abundance distribution more and more uneven.
The function diversity() from vegan package enables us to calculate the index of Shannon.

virt <- matrix(10, nrow = 4, ncol = 4)
virt <- matrix(10, nrow = 4, ncol = 6,
               dimnames = list(c("plot1", "plot2", "plot3", "plot4"),
                               c("sp1", "sp2", "sp3", "sp4", "sp5", "sp6")))
virt[, 1] <- c(10, 100, 1000, 10000)

library(vegan)
diversity(virt, index = "shannon")
##      plot1      plot2      plot3      plot4 
## 1.79175947 1.17299347 0.26808398 0.03935448

Furthermore, the Shannon index increases if the number of species increases while the abundance distribution remains the same. For example, if we add a fifth community where not all species are present:

virt <- rbind(virt, plot5 = c(0, 0, 0, 10, 10, 10))
virt
##         sp1 sp2 sp3 sp4 sp5 sp6
## plot1    10  10  10  10  10  10
## plot2   100  10  10  10  10  10
## plot3  1000  10  10  10  10  10
## plot4 10000  10  10  10  10  10
## plot5     0   0   0  10  10  10
diversity(virt, index = "shannon")[c(1, 5)]
##    plot1    plot5 
## 1.791759 1.098612

Let us now calculate this index on BCI data.

shan <- diversity(BCI, index = "shannon")

# Histogram
hist(shan, col = "grey", main = "Shannon diversity", xlab = "Value")

One community has a very low Shannon index value, and therefore uneven distribution of abundances.

2.2. Pielou’s evenness

The Pielou’s index of evenness (or equitability) specifically quantifies the extent to which species abundances are unbalanced in the community. It ranges from 0 to 1 (abundances completely even), and decreases with the heterogeneity of abundances in one community.

Its formula is based on Shannon’s index:

\[ J' = \frac{H'}{H'_{max}} = \frac{H'}{log(S)} \] with \(S\) the species richness.

Using diversity() and specnumber() we can calculate it on BCI data:

pielou <- diversity(BCI, index = "shannon")/log(specnumber(BCI))

# Histogram
hist(pielou, col = "grey", main = "Pielou's evenness", xlab = "Value")

2.3. Simpson

The second most used index to measure evenness in ecology is probably the Simpson’s index. Like for Shannon, it is based on the proportion of individuals each species has in a community. Its formula is however different:

\[ D = 1 -\sum_{i=1}^Sp_i^2 \]

with \(p_i\) the proportion of individuals for species \(i\) and \(S\) the species richness.

We can calculate it on BCI data using the same diversity() function but with another argument.

simp <- diversity(BCI, index = "simpson")
invsimp <- diversity(BCI, "inv")  # Inverse Simpson diversity

# Histogram
hist(simp, col = "grey", main = "Simpson diversity", xlab = "Value")

The distribution of values also highlights a very uneven plot.

3. Hill numbers

As discussed in the lecture, the above diversity indices have some undesirable characteristics. For example, doubling the number of species with equal distribution of abundances does not necessarily lead to a doubling of the index values (non-linear properties). Furthermore, a community with \(S\) equally common species does not necessarily have a diversity of \(S\) and the unit of the index is not species, although both would be desirable. One solution to this problem is to convert the values into effective numbers of species, i.e. the number of equally common species that would be needed to reach a given index value.

The indices discussed can be transformed into effective numbers of species to meet the above characteristics.

In the case of Shannon \(H\), the transformation is achieved by doing \(e^H\).

In the case of Simpson \(D\), \(\frac{1}{1-D}\).

We will look at the properties of the indices using our artificially generated species-by-sites matrix:

virt
##         sp1 sp2 sp3 sp4 sp5 sp6
## plot1    10  10  10  10  10  10
## plot2   100  10  10  10  10  10
## plot3  1000  10  10  10  10  10
## plot4 10000  10  10  10  10  10
## plot5     0   0   0  10  10  10
diversity(virt, index = "shannon")
##      plot1      plot2      plot3      plot4      plot5 
## 1.79175947 1.17299347 0.26808398 0.03935448 1.09861229
exp(diversity(virt, index = "shannon"))
##    plot1    plot2    plot3    plot4    plot5 
## 6.000000 3.231652 1.307457 1.040139 3.000000
diversity(virt, index = "simpson")
##       plot1       plot2       plot3       plot4       plot5 
## 0.833333333 0.533333333 0.092517007 0.009920547 0.666666667
1/(1-diversity(virt, index = "simpson"))
##    plot1    plot2    plot3    plot4    plot5 
## 6.000000 2.142857 1.101949 1.010020 3.000000

The transformation leads to a situation in which the diversity in communities with constant abundances corresponds to the number of species and, accordingly, a community with twice the number of species has twice the diversity with the same abundance.

However, the values of the two indices differ for communities with non-constant abundances, as Simpson gives greater weight to abundances than Shannon. Hill numbers offers a generalisation of this weighting of abundances while retaining the desired characteristics. The calculation is done as

\[ D^q = \biggl(\sum_{i=1}^Sp_i^{q}\biggr)^\frac{1}{(1-q)} \]

where \(p_i\) is the relative abundance of the species \(i\) and \(S\) is the total number of species. The parameter \(q\) determines the sensitivity of the index to the abundances:

\(q = 0\) corresponds to the number of species,
\(q = 1\) corresponds to \(e^H\) (Shannon) and
\(q = 2\) corresponds to \(\frac{1}{1-D}\) (Simpson).

However, other values for \(q\) can also be selected. The function hill_taxa() from hillR package allows us to do so:

library(hillR)
hill_taxa(BCI, q = 0) # species richness
##  Plot1  Plot2  Plot3  Plot4  Plot5  Plot6  Plot7  Plot8  Plot9 Plot10 Plot11 
##     93     84     90     94    101     85     82     88     90     94     87 
## Plot12 Plot13 Plot14 Plot15 Plot16 Plot17 Plot18 Plot19 Plot20 Plot21 Plot22 
##     84     93     98     93     93     93     89    109    100     99     91 
## Plot23 Plot24 Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32 Plot33 
##     99     95    105     91     99     85     86     97     77     88     86 
## Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40 Plot41 Plot42 Plot43 Plot44 
##     92     83     92     88     82     84     80    102     87     86     81 
## Plot45 Plot46 Plot47 Plot48 Plot49 Plot50 
##     81     86    102     91     91     93
identical(hill_taxa(BCI, q = 0), specnumber(BCI))
## [1] TRUE
# With BCI
hill_BCI <- data.frame(q = seq(0, 3, by = 0.25),
                       Diversity35 = NA,
                       Diversity7 = NA)

for (i in 1:nrow(hill_BCI)){
  hill_BCI[i, c("Diversity35", "Diversity7")] <- hill_taxa(BCI[c(35, 7),],
                                                           q = hill_BCI$q[i])
}

plot(hill_BCI$q, hill_BCI$Diversity7, type = "l",
     xlab = "q", ylab = "Diversity",
     col = rgb(1, 0.4, 0.2, 0.6), ylim = c(0, 90))

points(hill_BCI$q, hill_BCI$Diversity35, type = "l",
       xlab = "q", ylab = "Diversity", col = rgb(0.4, 0.4, 0.8, 0.6))

legend(1.7, 85, c("Plot 35", "Plot 7"), title = "Plot",
       text.font = 3, lwd = 2, seg.len = 1,
       col = c(rgb(0.4, 0.4, 0.8, 0.6), rgb(1, 0.4, 0.2, 0.6)),
       lty = 1)

Plot 7 has a greater diversity than plot 35 at any \(q\), even if they have a comparable species richness.

4. Parametric indices

4.1. Fisher’s alpha

The Fisher’s \(\alpha\) index is a parameter describing the standard log-series distribution that can be fitted to the observed pattern of relative abundances. It is widely used as a diversity measure and is one of the most recommended when the number of individuals in a plot exceeds 1000.

Note that other distributions can be fitted to an abundance distribution and that log-series is not the best one measures but we won’t do it in this practical (for more details, see ?radfit).

In Fisher’s logarithmic series, the expected number of species \(S\) with \(n\) observed individuals is the following:

\[ S_n = \alpha\frac{x^n}{n} \]

with \(S_n\) the number of species with \(n\) individuals, \(x\) a positive constant that you need to solve in an iterative way and \(\alpha\) an empirical constant determining the diversity index. Since \(x\) is empirically bounded between 0.5 and 1, is approximately the number of species with one individual.

fisherfit() function allows us to visualize abundance distributions in certain plots and the associated Fisher fitted lines. Plot the species abundance distribution of community 1, and the fitted log-series distribution

par(mfrow = c(1, 2))
plot(fisherfit(BCI[35, ]), main = "Plot 35", bar.col = "grey")
plot(fisherfit(BCI[7, ]), main = "Plot 7", bar.col = "grey")

In this graphic, the height of the bar is the number of species that have a given abundance on abscissa. As seen before, we can see that the abundance distribution of the plots 35 and 7 differ quite a lot, the red line also shows very distinct profile.

We now first decompose how to extract the \(\alpha\) parameter for one plot. The first step is to solve the value of \(x\). To solve \(x\), an iterative procedure has to be done. Iteration means trying different values of \(x\) and stop when both terms of the following equation are equal.

\[ S/N = [\frac{1-x}{x}]\times[-log(1-x)] \] In the case of BCI plot 35, we get:

N35 <- rowSums(BCI)[35]
S35 <- rowSums(BCI_bin)[35]
S35/N35
##    Plot35 
## 0.1381032
x <- 0.95
abs(S35/N35 - (((1-x)/x)*(-log(1-x))))
##     Plot35 
## 0.01956696
x <- 0.958
abs(S35/N35 - (((1-x)/x)*(-log(1-x))))
##       Plot35 
## 0.0008776296
x <- 0.9585
abs(S35/N35 - (((1-x)/x)*(-log(1-x)))) # let's stop here
##       Plot35 
## 0.0003300087

Once \(x\) has been estimated, we can \(\alpha\) by solving the equation: \[ \alpha = \frac{N(1-x)}{x} \] which gives for plot 35:

N35 <- rowSums(BCI)[35]
S35 <- rowSums(BCI_bin)[35]
S35/N35
##    Plot35 
## 0.1381032
alpha35 <- (N35*(1-x))/x
alpha35
##   Plot35 
## 26.02139

Fortunately, we do not have to do all these tedious steps as there is already a ready-to-use function from vegan package. To extract parameter for every community, we can use the function fisher.alpha().

alpha <- fisher.alpha(BCI)
alpha
##    Plot1    Plot2    Plot3    Plot4    Plot5    Plot6    Plot7    Plot8 
## 35.67297 30.99091 33.32033 33.92209 37.96423 32.49374 30.58383 33.44981 
##    Plot9   Plot10   Plot11   Plot12   Plot13   Plot14   Plot15   Plot16 
## 35.67236 34.82320 34.20590 34.12041 37.56974 39.21837 35.07926 36.16927 
##   Plot17   Plot18   Plot19   Plot20   Plot21   Plot22   Plot23   Plot24 
## 39.21057 38.71442 46.85170 40.99634 41.58728 35.84847 46.92764 39.87658 
##   Plot25   Plot26   Plot27   Plot28   Plot29   Plot30   Plot31   Plot32 
## 43.53983 36.40222 41.03649 33.65451 35.54363 36.87417 27.62298 32.34465 
##   Plot33   Plot34   Plot35   Plot36   Plot37   Plot38   Plot39   Plot40 
## 32.08718 35.12348 26.11065 35.88775 33.28235 29.46136 31.41425 27.17138 
##   Plot41   Plot42   Plot43   Plot44   Plot45   Plot46   Plot47   Plot48 
## 44.06386 33.59822 33.31374 30.28666 29.02041 32.32598 42.56011 35.99618 
##   Plot49   Plot50 
## 35.41942 36.40426
alpha35; alpha[35]
##   Plot35 
## 26.02139
##   Plot35 
## 26.11065
alpha[7]
##    Plot7 
## 30.58383

Back to Index

Index