In this class, we will consider an inventory (i.e. dataset) of tree communities in a tropical forest in Panama and go through the first properties an ecologist/forester can look at.
Before starting
If you would like to see this tutorial in your viewer within RStudio
(saves space on the screen), please execute the following chunk of
code:
install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson2.html",
destfile = file.path(dir, "lesson2.html"))
htmlFile <- file.path(dir, "lesson2.html")
rstudioapi::viewer(htmlFile)In ecology, or forestry, biodiversity assessment usually consists in, first, designing plots/study sites and, second, identifying the sampled species.
The selected plots should represent environmentally homogeneous spatial units. These units are also called communities.
Local abundances of each species can be recorded. Other features, like functional traits, can be collected or sampled from the present species.
There are several ways to record abundances: counting the number of individuals of every species or estimate their spatial cover are the most common ones. Sometimes, the abundances of species are not measured and only presence/absence data are available.
Digitalizing the field data then creates a so-called site-species matrix. This object usually contains the sites/plots in rows and the species in columns (but be careful, sometimes it can also be the other way around). The matrix is filled with the local abundances of species in plots. If only presence/absence are recorded, then the matrix only contains 1 and 0.
For example, in the following figure, the blue case indicates that the Species 2 has 3 individuals in the Plot 1.
The Barro Colorado Island (BCI) is the largest island of the canal of
Panama. It formed in 1913 when the Chagres River was dammed during canal
construction, and, since 1923, the island is a place of intensive
scientific research focused on ecology of lowland tropical forest.
From the floristic point of view, it may be the most explored tropical
area in the world.
On this island, 50 permanent plots of 1 hectar each were established by the Smithsonian Tropical Research Institute and Princeton University to study the dynamics of tropical forest vegetation.
In these 50 1-ha plots, every tree individual with a diameter at breast height (DBH) > 10 cm was recorded.
If you have not downloaded the data yet, please download the file BCI_Data.csv on Stud-IP and copy it into the data directory in your project folder.
BCI <- read.table("data/BCI_Data.csv",
header = TRUE, # does the file have column names
sep = ",", # what separates the column
row.names = 1) # does the file have row namesNote that the BCI dataset is also provided in the vegan
package. To load data stored in some packages, you have to use the
function data() and specify the name of the dataset as an
argument.
If you want to load the BCI dataset through vegan run
the following code (note that this dataset version has plots as rows and
species as column, while in the dataset you downloaded from StupIP is
the opposite. In the next examples we will use the dataset from
StudIP):
# Installing vegan package on your computer
# install.packages("vegan")
# Loading vegan package in your current session
library(vegan)
# Loading the data
data(BCI)
# Check the objects loaded in the local environment
ls()Now that the data is loaded, we first examine the structure of the data.
## int [1:225, 1:50] 0 0 0 0 0 0 2 0 0 0 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:225] "Abarema.macradenium" "Acacia.melanoceras" "Acalypha.diversifolia" "Acalypha.macrostachya" ...
## ..$ : chr [1:50] "Plot1" "Plot2" "Plot3" "Plot4" ...
## [1] 225 50
## Plot4 Plot5 Plot6
## Adelia.triloba 3 1 0
## Aegiphila.panamensis 0 1 0
## Alchornea.costaricensis 18 3 2
The dataset has species in rows and plots in
columns. To reverse it like in the example figure shown above,
we can transpose the matrix using the dedicated function
t(). Note that doing this coerce a data.frame into a matrix
object.
## int [1:50, 1:225] 0 0 0 0 0 0 0 0 0 1 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:50] "Plot1" "Plot2" "Plot3" "Plot4" ...
## ..$ : chr [1:225] "Abarema.macradenium" "Acacia.melanoceras" "Acalypha.diversifolia" "Acalypha.macrostachya" ...
## [1] "Plot1" "Plot2" "Plot3" "Plot4" "Plot5"
## [1] "Abarema.macradenium" "Acacia.melanoceras" "Acalypha.diversifolia"
## [4] "Acalypha.macrostachya" "Adelia.triloba"
## Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4 3 0 18
## Plot5 1 1 3
## Plot6 0 0 2
## [1] 50 225
## [1] 50
## [1] 225
The site-species matrix is filled with abundance data. The next lines of code create a binary site-species matrix out of it, ony filled with presences and absences.
BCI_bin <- BCI # Create another object BCI_bin, which is a copy of BCI
BCI_bin[BCI_bin > 0] <- 1 # replace all positive values with 1
BCI[4:6, 5:7] # Abundance values in the "original" dataset ## Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4 3 0 18
## Plot5 1 1 3
## Plot6 0 0 2
## Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4 1 0 1
## Plot5 1 1 1
## Plot6 0 0 1
Each forest plot of the site-species matrix has a certain number of individuals and species. Let’s first see how many species and individuals were sampled in the first community.
## Abarema.macradenium Acacia.melanoceras Acalypha.diversifolia
## 0 0 0
## Acalypha.macrostachya Adelia.triloba Aegiphila.panamensis
## 0 0 0
## Alchornea.costaricensis Alseis.blackiana Annona.spraguei
## 2 25 1
## Apeiba.aspera Apeiba.tibourbou Astronium.graveolens
## 13 2 6
How many individual trees were sampled in the first plot?
What number of species does the first plot have?
## [1] 93
sum(BCI_bin[1, ]) # we can also sum the values from the presence-absence matrix to obtain the same result## [1] 93
We can repeat these operations over the whole site-species matrix
using the function rowSums().
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6
## 448 435 463 508 505 412
## Plot35 Plot4 Plot5 Plot40 Plot10 Plot30
## 601 508 505 489 483 475
## Plot23 Plot18 Plot29 Plot12 Plot17 Plot28
## 340 347 364 366 381 387
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 340.0 409.0 428.0 429.1 443.5 601.0
We can plot the histogram of the individual frequency over the sampled plots.
# Histogram
hist(rowSums(BCI),
col = "grey", # colors of bins
main = "Community sampling", # plot title
xlab = "Number of individuals") # x-axis titleThe previous histogram indicated us how many individuals were sampled in each plot. But it does not tell us how many different species were found in each plot.
Now plot the histogram of species richness across plots.
Same code but applied on the presence/absence site-species matrix
BCI_bin.
hist(rowSums(BCI_bin), col = "grey",
main = "Species richness", # plot title
xlab = "Number of species") # x-axis titleIntuitively, the species richness of a given plot should be positively linked with the number of individuals sampled. Indeed, adding a new individual to a sample can only add a new species to the plot. Thus, the sampling effort can bias the estimation of species richness.
Methods exist to disentangle the sampling bias from the ecological processes in the assessment of species richness. We will see these methods in later classes. Here, we just look whether there is such a positive relationship in BCI data.
To do so, we first construct a table containing the number of individuals and tree species per community.
# Number of individuals per community
ind_com <- rowSums(BCI)
ind_com <- data.frame(com = names(ind_com), # specify the first column that include the plot names
nb_ind = as.numeric(ind_com)) # specify the second column that include the number of individuals per plot
head(ind_com)## com nb_ind
## 1 Plot1 448
## 2 Plot2 435
## 3 Plot3 463
## 4 Plot4 508
## 5 Plot5 505
## 6 Plot6 412
Now build the equivalent data.frame for species richness
(use the presence-absence matric, BCI_bin).
# Number of species per community
sr_com <- rowSums(BCI_bin)
sr_com <- data.frame(com = names(sr_com),
rich = as.numeric(sr_com))
head(sr_com)## com rich
## 1 Plot1 93
## 2 Plot2 84
## 3 Plot3 90
## 4 Plot4 94
## 5 Plot5 101
## 6 Plot6 85
We can now merge the two tables using merge(). Then, we
plot the relationship between these two features. Note that
thatmerge() “unites” two data frames using common columns
or row names.
# Merging the two tables
sr_ind <- merge(sr_com, ind_com, by = "com") # the "by" argument indicates the common column (must have the same name in both of the data frame)
head(sr_ind); dim(sr_ind)## com rich nb_ind
## 1 Plot1 93 448
## 2 Plot10 94 483
## 3 Plot11 87 401
## 4 Plot12 84 366
## 5 Plot13 93 409
## 6 Plot14 98 438
## [1] 50 3
# Scatterplot
plot(sr_ind$nb_ind, sr_ind$rich,
pch = 16,
xlab = "Number of individuals",
ylab = "Species richness")The relationship between species richness and number of individuals sampled does not seem that obvious.
To better visualize it, we can also plot the regression line of the relationship between species richness and the number of individual:
# Redo the Scatterplot
plot(sr_ind$nb_ind, sr_ind$rich,
pch = 16,
xlab = "Number of individuals",
ylab = "Species richness")
# abline() adds a line in the current plot, in this case the regression line (intercept and slope) from the linear model betwenn richness and n of individuals
abline(reg = lm(sr_ind$rich ~ sr_ind$nb_ind))After characterizing the richness of each forest plot, located on the rows of the site-species matrix, we can study the frequency of each species. So, we here focus on the columns of the site-species matrix.
Let’s start with one species.
In how many plots does it occur?
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 Plot7 Plot8 Plot9 Plot10 Plot11
## 0 0 0 3 1 0 0 0 5 0 0
## Plot12 Plot13 Plot14 Plot15 Plot16 Plot17 Plot18 Plot19 Plot20 Plot21 Plot22
## 1 1 0 2 2 0 1 0 0 0 1
## Plot23 Plot24 Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32 Plot33
## 0 2 0 0 1 0 1 14 5 7 3
## Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40 Plot41 Plot42 Plot43 Plot44
## 3 6 1 2 6 9 7 0 0 0 4
## Plot45 Plot46 Plot47 Plot48 Plot49 Plot50
## 0 0 2 1 0 1
## Plot4 Plot5 Plot9 Plot12 Plot13 Plot15 Plot16 Plot18 Plot22 Plot24 Plot27
## 3 1 5 1 1 2 2 1 1 2 1
## Plot29 Plot30 Plot31 Plot32 Plot33 Plot34 Plot35 Plot36 Plot37 Plot38 Plot39
## 1 14 5 7 3 3 6 1 2 6 9
## Plot40 Plot44 Plot47 Plot48 Plot50
## 7 4 2 1 1
## [1] 27
## [1] 27
Now, for all species we can sum the abundances/presences by columns
to get information about species’ occurrences using
colSums() function.
## Abarema.macradenium Acacia.melanoceras Acalypha.diversifolia
## 1 3 2
## Acalypha.macrostachya Adelia.triloba Aegiphila.panamensis
## 1 92 23
## Faramea.occidentalis Trichilia.tuberculata Alseis.blackiana
## 1717 1681 983
## Oenocarpus.mapora Poulsenia.armata Quararibea.asterolepis
## 788 755 724
# Integrated function from vegan package
specnumber(BCI) # specnumber finds the number of species in each plot (like we did manually with rowsum)## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 Plot7 Plot8 Plot9 Plot10 Plot11
## 93 84 90 94 101 85 82 88 90 94 87
## Plot12 Plot13 Plot14 Plot15 Plot16 Plot17 Plot18 Plot19 Plot20 Plot21 Plot22
## 84 93 98 93 93 93 89 109 100 99 91
## Plot23 Plot24 Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32 Plot33
## 99 95 105 91 99 85 86 97 77 88 86
## Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40 Plot41 Plot42 Plot43 Plot44
## 92 83 92 88 82 84 80 102 87 86 81
## Plot45 Plot46 Plot47 Plot48 Plot49 Plot50
## 81 86 102 91 91 93
Similarly than for the plot richness, we can characterize the number of occurrences of each species and the number of plots in which they occur.
First, each species has a certain frequency in the site-species matrix. If we use the abundance site-species matrix (BCI), the frequency of the species can here be defined as its number of individuals.
# Number of individuals per species
ind_sp <- colSums(BCI)
ind_sp <- data.frame(sp = names(ind_sp),
nb_ind = as.numeric(ind_sp))
head(ind_sp)## sp nb_ind
## 1 Abarema.macradenium 1
## 2 Acacia.melanoceras 3
## 3 Acalypha.diversifolia 2
## 4 Acalypha.macrostachya 1
## 5 Adelia.triloba 92
## 6 Aegiphila.panamensis 23
hist(ind_sp$nb_ind, col = "grey",
main = "Species frequency",
xlab = "Number of individuals") # x-axis titleSecond, each species occurs in a certain amount of plots. The number of plots in which one species occur define its occupancy.
# Number of species individuals per community
com_sp <- colSums(BCI_bin)
com_sp <- data.frame(sp = names(com_sp),
nb_plot = as.numeric(com_sp))
head(com_sp)## sp nb_plot
## 1 Abarema.macradenium 1
## 2 Acacia.melanoceras 2
## 3 Acalypha.diversifolia 2
## 4 Acalypha.macrostachya 1
## 5 Adelia.triloba 27
## 6 Aegiphila.panamensis 18
hist(com_sp$nb_plot, col = "grey",
main = "Species occupancy",
xlab = "Number of plots per species") # x-axis title# Merge the two tables
nbplot_ind <- merge(com_sp, ind_sp, by = "sp")
head(nbplot_ind); dim(nbplot_ind)## sp nb_plot nb_ind
## 1 Abarema.macradenium 1 1
## 2 Acacia.melanoceras 2 3
## 3 Acalypha.diversifolia 2 2
## 4 Acalypha.macrostachya 1 1
## 5 Adelia.triloba 27 92
## 6 Aegiphila.panamensis 18 23
## [1] 225 3
# Scatterplot
plot(nbplot_ind$nb_ind, nbplot_ind$nb_plot,
pch = 16,
main = "Occupancy vs frequency",
xlab = "Number of individuals",
ylab = "Number of plots")As expected, there is a strong positive relationship between the number of individuals a species has and the number of plot it occurs in.