In this class, we will consider an inventory (i.e. dataset) of tree communities in a tropical forest in Panama and go through the first properties an ecologist/forester can look at.

Before starting
If you would like to see this tutorial in your viewer within RStudio (saves space on the screen), please execute the following chunk of code:

install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
              destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson2.html",
              destfile = file.path(dir, "lesson2.html"))
htmlFile <- file.path(dir, "lesson2.html")
rstudioapi::viewer(htmlFile)

1. Site-species matrix

In ecology, or forestry, biodiversity assessment usually consists in, first, designing plots/study sites and, second, identifying the sampled species.

The selected plots should represent environmentally homogeneous spatial units. These units are also called communities.

Local abundances of each species can be recorded. Other features, like functional traits, can be collected or sampled from the present species.

There are several ways to record abundances: counting the number of individuals of every species or estimate their spatial cover are the most common ones. Sometimes, the abundances of species are not measured and only presence/absence data are available.

Digitalizing the field data then creates a so-called site-species matrix. This object usually contains the sites/plots in rows and the species in columns (but be careful, sometimes it can also be the other way around). The matrix is filled with the local abundances of species in plots. If only presence/absence are recorded, then the matrix only contains 1 and 0.

For example, in the following figure, the blue case indicates that the Species 2 has 3 individuals in the Plot 1.

1.1. Barro Colorado Island (BCI)

The Barro Colorado Island (BCI) is the largest island of the canal of Panama. It formed in 1913 when the Chagres River was dammed during canal construction, and, since 1923, the island is a place of intensive scientific research focused on ecology of lowland tropical forest.
From the floristic point of view, it may be the most explored tropical area in the world.

On this island, 50 permanent plots of 1 hectar each were established by the Smithsonian Tropical Research Institute and Princeton University to study the dynamics of tropical forest vegetation.

In these 50 1-ha plots, every tree individual with a diameter at breast height (DBH) > 10 cm was recorded.

1.2. Import in R

If you have not downloaded the data yet, please download the file BCI_Data.csv on Stud-IP and copy it into the data directory in your project folder.

BCI <- read.table("data/BCI_Data.csv",
                  header = TRUE, # does the file have column names
                  sep = ",", # what separates the column
                  row.names = 1) # does the file have row names

Note that the BCI dataset is also provided in the vegan package. To load data stored in some packages, you have to use the function data() and specify the name of the dataset as an argument.

If you want to load the BCI dataset through vegan run the following code (note that this dataset version has plots as rows and species as column, while in the dataset you downloaded from StupIP is the opposite. In the next examples we will use the dataset from StudIP):

# Installing vegan package on your computer
# install.packages("vegan")
# Loading vegan package in your current session
library(vegan)

# Loading the data
data(BCI)

# Check the objects loaded in the local environment
ls()

Now that the data is loaded, we first examine the structure of the data.

# str() gets the structure of any R object
str(as.matrix(BCI))

##  int [1:225, 1:50] 0 0 0 0 0 0 2 0 0 0 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:225] "Abarema.macradenium" "Acacia.melanoceras" "Acalypha.diversifolia" "Acalypha.macrostachya" ...
##   ..$ : chr [1:50] "Plot1" "Plot2" "Plot3" "Plot4" ...

# dimension of the object
dim(BCI)

## [1] 225  50

# Rows 5 to 7 and columns 4 to 6
BCI[5:7, 4:6]

##                         Plot4 Plot5 Plot6
## Adelia.triloba              3     1     0
## Aegiphila.panamensis        0     1     0
## Alchornea.costaricensis    18     3     2

The dataset has species in rows and plots in columns. To reverse it like in the example figure shown above, we can transpose the matrix using the dedicated function t(). Note that doing this coerce a data.frame into a matrix object.

BCI <- t(BCI)

What information is stored in the rows and columns of the table?

Solution

str(as.matrix(BCI)) # structure

##  int [1:50, 1:225] 0 0 0 0 0 0 0 0 0 1 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:50] "Plot1" "Plot2" "Plot3" "Plot4" ...
##   ..$ : chr [1:225] "Abarema.macradenium" "Acacia.melanoceras" "Acalypha.diversifolia" "Acalypha.macrostachya" ...

rownames(BCI)[1:5] # row names = plots

## [1] "Plot1" "Plot2" "Plot3" "Plot4" "Plot5"

colnames(BCI)[1:5] # column names = species names

## [1] "Abarema.macradenium"   "Acacia.melanoceras"    "Acalypha.diversifolia"
## [4] "Acalypha.macrostachya" "Adelia.triloba"

How many individuals species 5 to 7 have in plots 4 to 6?

Solution

BCI [4:6, 5:7]

##       Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4              3                    0                      18
## Plot5              1                    1                       3
## Plot6              0                    0                       2

How many sites and species are there in total?

Solution

# Number of rows and columns of the table
dim(BCI) # Dimensions

## [1]  50 225

nrow(BCI); ncol(BCI) # number of rows = number of plots; number of columns = number of species

## [1] 50

## [1] 225

How many individuals are there in total?

Solution

sum(BCI) #Sum of all values = number of all individuals

## [1] 21457

1.3. From abundance to 1/0 data

The site-species matrix is filled with abundance data. The next lines of code create a binary site-species matrix out of it, ony filled with presences and absences.

BCI_bin <- BCI  # Create another object BCI_bin, which is a copy of BCI
BCI_bin[BCI_bin > 0] <- 1 # replace all positive values with 1

BCI[4:6, 5:7] # Abundance values in the "original" dataset

##       Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4              3                    0                      18
## Plot5              1                    1                       3
## Plot6              0                    0                       2

BCI_bin[4:6, 5:7] # abundances have been transformed into presences

##       Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4              1                    0                       1
## Plot5              1                    1                       1
## Plot6              0                    0                       1

2. Richness of the communities

Each forest plot of the site-species matrix has a certain number of individuals and species. Let’s first see how many species and individuals were sampled in the first community.

2.1. One community

# First five species of the first community
head(BCI[1, ])

##   Abarema.macradenium    Acacia.melanoceras Acalypha.diversifolia 
##                     0                     0                     0 
## Acalypha.macrostachya        Adelia.triloba  Aegiphila.panamensis 
##                     0                     0                     0

# Only present species in the first community
head(BCI[1, ][BCI[1, ] > 0])

## Alchornea.costaricensis        Alseis.blackiana         Annona.spraguei 
##                       2                      25                       1 
##           Apeiba.aspera        Apeiba.tibourbou    Astronium.graveolens 
##                      13                       2                       6

How many individual trees were sampled in the first plot?

Solution

# Total number of individuals for first community
sum(BCI[1, ])

## [1] 448

What number of species does the first plot have?

Solution

# Number of species present in community 1
length(BCI[1, ][BCI[1, ] > 0])

## [1] 93

sum(BCI_bin[1, ]) # we can also sum the values from the presence-absence matrix to obtain the same result

## [1] 93

2.2. All the communities

We can repeat these operations over the whole site-species matrix using the function rowSums().

# For all communities: number of individuals
head(rowSums(BCI))

## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 
##   448   435   463   508   505   412

# Sorting the vector, descending order
head(sort(rowSums(BCI), decreasing = TRUE))

## Plot35  Plot4  Plot5 Plot40 Plot10 Plot30 
##    601    508    505    489    483    475

# Sorting the vector, ascending order
head(sort(rowSums(BCI), decreasing = FALSE))

## Plot23 Plot18 Plot29 Plot12 Plot17 Plot28 
##    340    347    364    366    381    387

# Basic summary statistics with summary()
summary(rowSums(BCI))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   340.0   409.0   428.0   429.1   443.5   601.0

We can plot the histogram of the individual frequency over the sampled plots.

# Histogram
hist(rowSums(BCI),
     col = "grey", # colors of bins
     main = "Community sampling", # plot title
     xlab = "Number of individuals") # x-axis title

The previous histogram indicated us how many individuals were sampled in each plot. But it does not tell us how many different species were found in each plot.

Now plot the histogram of species richness across plots.

Solution

Same code but applied on the presence/absence site-species matrix BCI_bin.

hist(rowSums(BCI_bin), col = "grey",
     main = "Species richness", # plot title
     xlab = "Number of species") # x-axis title

2.3. Number of individuals versus species richness

Intuitively, the species richness of a given plot should be positively linked with the number of individuals sampled. Indeed, adding a new individual to a sample can only add a new species to the plot. Thus, the sampling effort can bias the estimation of species richness.

Methods exist to disentangle the sampling bias from the ecological processes in the assessment of species richness. We will see these methods in later classes. Here, we just look whether there is such a positive relationship in BCI data.

To do so, we first construct a table containing the number of individuals and tree species per community.

# Number of individuals per community
ind_com <- rowSums(BCI)
ind_com <- data.frame(com = names(ind_com), # specify the first column that include the plot names
                      nb_ind = as.numeric(ind_com)) # specify the second column that include the number of individuals per plot
head(ind_com)

##     com nb_ind
## 1 Plot1    448
## 2 Plot2    435
## 3 Plot3    463
## 4 Plot4    508
## 5 Plot5    505
## 6 Plot6    412

Now build the equivalent data.frame for species richness (use the presence-absence matric, BCI_bin).

Solution

# Number of species per community
sr_com <- rowSums(BCI_bin)
sr_com <- data.frame(com = names(sr_com),
                     rich = as.numeric(sr_com))
head(sr_com)

##     com rich
## 1 Plot1   93
## 2 Plot2   84
## 3 Plot3   90
## 4 Plot4   94
## 5 Plot5  101
## 6 Plot6   85

We can now merge the two tables using merge(). Then, we plot the relationship between these two features. Note that thatmerge() “unites” two data frames using common columns or row names.

# Merging the two tables
sr_ind <- merge(sr_com, ind_com, by = "com") # the "by" argument indicates the common column (must have the same name in both of the data frame)
head(sr_ind); dim(sr_ind)

##      com rich nb_ind
## 1  Plot1   93    448
## 2 Plot10   94    483
## 3 Plot11   87    401
## 4 Plot12   84    366
## 5 Plot13   93    409
## 6 Plot14   98    438

## [1] 50  3

# Scatterplot
plot(sr_ind$nb_ind, sr_ind$rich,
     pch = 16,
     xlab = "Number of individuals",
     ylab = "Species richness")

The relationship between species richness and number of individuals sampled does not seem that obvious.

To better visualize it, we can also plot the regression line of the relationship between species richness and the number of individual:

# Redo the Scatterplot
plot(sr_ind$nb_ind, sr_ind$rich,
     pch = 16,
     xlab = "Number of individuals",
     ylab = "Species richness")
# abline() adds a line in the current plot, in this case the regression line (intercept and slope) from the linear model betwenn richness and n of individuals   
abline(reg = lm(sr_ind$rich ~ sr_ind$nb_ind))

3. Frequency of species

After characterizing the richness of each forest plot, located on the rows of the site-species matrix, we can study the frequency of each species. So, we here focus on the columns of the site-species matrix.

Let’s start with one species.

3.1. One species

Let’s look at the fifth species. How many individuals does this species have?

Solution

# Total number of individuals for species 5
sum(BCI[, 5])

## [1] 92

In how many plots does it occur?

Solution

# Number of individuals for the species 5 in each of the 50 transects
BCI[, 5]

##  Plot1  Plot2  Plot3  Plot4  Plot5  Plot6  Plot7  Plot8  Plot9 Plot10 Plot11 
##      0      0      0      3      1      0      0      0      5      0      0 
## Plot12 Plot13 Plot14 Plot15 Plot16 Plot17 Plot18 Plot19 Plot20 Plot21 Plot22 
##      1      1      0      2      2      0      1      0      0      0      1 
## Plot23 Plot24 Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32 Plot33 
##      0      2      0      0      1      0      1     14      5      7      3 
## Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40 Plot41 Plot42 Plot43 Plot44 
##      3      6      1      2      6      9      7      0      0      0      4 
## Plot45 Plot46 Plot47 Plot48 Plot49 Plot50 
##      0      0      2      1      0      1

BCI[, 5][BCI[, 5] > 0] # only plots where species 5 is present

##  Plot4  Plot5  Plot9 Plot12 Plot13 Plot15 Plot16 Plot18 Plot22 Plot24 Plot27 
##      3      1      5      1      1      2      2      1      1      2      1 
## Plot29 Plot30 Plot31 Plot32 Plot33 Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 
##      1     14      5      7      3      3      6      1      2      6      9 
## Plot40 Plot44 Plot47 Plot48 Plot50 
##      7      4      2      1      1

length(BCI[, 5][BCI[, 5] > 0]) # 27 transects in which species 5 occurs

## [1] 27

# or using the presence/absence matrix
sum(BCI_bin[, 5])

## [1] 27

3.2. Every species

Now, for all species we can sum the abundances/presences by columns to get information about species’ occurrences using colSums() function.

head(colSums(BCI)) # number of individuals of each species

##   Abarema.macradenium    Acacia.melanoceras Acalypha.diversifolia 
##                     1                     3                     2 
## Acalypha.macrostachya        Adelia.triloba  Aegiphila.panamensis 
##                     1                    92                    23

head(sort(colSums(BCI), decreasing = TRUE)) # sorting the vector

##   Faramea.occidentalis  Trichilia.tuberculata       Alseis.blackiana 
##                   1717                   1681                    983 
##      Oenocarpus.mapora       Poulsenia.armata Quararibea.asterolepis 
##                    788                    755                    724

# Integrated function from vegan package
specnumber(BCI) # specnumber finds the number of species in each plot (like we did manually with rowsum)

##  Plot1  Plot2  Plot3  Plot4  Plot5  Plot6  Plot7  Plot8  Plot9 Plot10 Plot11 
##     93     84     90     94    101     85     82     88     90     94     87 
## Plot12 Plot13 Plot14 Plot15 Plot16 Plot17 Plot18 Plot19 Plot20 Plot21 Plot22 
##     84     93     98     93     93     93     89    109    100     99     91 
## Plot23 Plot24 Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32 Plot33 
##     99     95    105     91     99     85     86     97     77     88     86 
## Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40 Plot41 Plot42 Plot43 Plot44 
##     92     83     92     88     82     84     80    102     87     86     81 
## Plot45 Plot46 Plot47 Plot48 Plot49 Plot50 
##     81     86    102     91     91     93

3.3. Frequency and occupancy of species

Similarly than for the plot richness, we can characterize the number of occurrences of each species and the number of plots in which they occur.

First, each species has a certain frequency in the site-species matrix. If we use the abundance site-species matrix (BCI), the frequency of the species can here be defined as its number of individuals.

# Number of individuals per species
ind_sp <- colSums(BCI)
ind_sp <- data.frame(sp = names(ind_sp),
                     nb_ind = as.numeric(ind_sp))
head(ind_sp)

##                      sp nb_ind
## 1   Abarema.macradenium      1
## 2    Acacia.melanoceras      3
## 3 Acalypha.diversifolia      2
## 4 Acalypha.macrostachya      1
## 5        Adelia.triloba     92
## 6  Aegiphila.panamensis     23

hist(ind_sp$nb_ind, col = "grey",
     main = "Species frequency",
     xlab = "Number of individuals") # x-axis title

Second, each species occurs in a certain amount of plots. The number of plots in which one species occur define its occupancy.

# Number of species individuals per community
com_sp <- colSums(BCI_bin)
com_sp <- data.frame(sp = names(com_sp),
                     nb_plot = as.numeric(com_sp))
head(com_sp)

##                      sp nb_plot
## 1   Abarema.macradenium       1
## 2    Acacia.melanoceras       2
## 3 Acalypha.diversifolia       2
## 4 Acalypha.macrostachya       1
## 5        Adelia.triloba      27
## 6  Aegiphila.panamensis      18

hist(com_sp$nb_plot, col = "grey",
     main = "Species occupancy",
     xlab = "Number of plots per species") # x-axis title

Now merge the two tables.

Solution

# Merge the two tables
nbplot_ind <- merge(com_sp, ind_sp, by = "sp")
head(nbplot_ind); dim(nbplot_ind)

##                      sp nb_plot nb_ind
## 1   Abarema.macradenium       1      1
## 2    Acacia.melanoceras       2      3
## 3 Acalypha.diversifolia       2      2
## 4 Acalypha.macrostachya       1      1
## 5        Adelia.triloba      27     92
## 6  Aegiphila.panamensis      18     23

## [1] 225   3

You can now plot the link between the species frequency and occupancy.

Solution

# Scatterplot
plot(nbplot_ind$nb_ind, nbplot_ind$nb_plot,
     pch = 16,
     main = "Occupancy vs frequency",
     xlab = "Number of individuals",
     ylab = "Number of plots")

As expected, there is a strong positive relationship between the number of individuals a species has and the number of plot it occurs in.

4. References

A Rain-Forest Census Takes Shape, Tree by Tree, New York Times, C. Ziegler, 2006

Back to Index

Index

Analysis of biodiversity and ecosystem functioning in R

Practical 2: Site-species matrices in R

Pierre Denelle (Main author), Nicola Pavanetto

2026-01-13

1. Site-species matrix

1.1. Barro Colorado Island (BCI)

1.2. Import in R

1.3. From abundance to 1/0 data

2. Richness of the communities

2.1. One community

2.2. All the communities

2.3. Number of individuals versus species richness

3. Frequency of species

3.1. One species

3.2. Every species

3.3. Frequency and occupancy of species

4. References

Back to Index