Preparation: Please open your RStudio project and
download the new data (‘Phylogeny_ConservativeCon_Checklist.zip’,
‘palms_specsxsites_phylo.csv’ and ‘palmtree_pruned.nex’) for today from
Stud-IP
and unzip and copy them into your .Rproj folder ‘/Data/’. You can use
getwd()
to locate your current working directory, which
should be your project folder. Please install the following R-packages
using install.packages()
:
ape
TNRS
remotes
sf
dplyr
psych
picante
pez
ggplot2
cowplot
GGally
RColorBrewer
bioregion
If you want to visualize this tutorial in the viewer inside RStudio (to save space on your screen) run the following chunk of code:
install.packages("rstudioapi") # install an R-package required for this step
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/mcmmb/index.html", destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/mcmmb/Day7.html", destfile = file.path(dir, "Day7.html"))
htmlFile <- file.path(dir, "Day7.html")
rstudioapi::viewer(htmlFile)
Now you can conveniently copy code from the viewer into your
script.
Load R packages & island data set
library("dplyr") # data manipulation
library("sf") # Geospatial library
library("TNRS") # Taxonomic standardisation of plant species names
# library("rWCVP") # Taxonomic standardisation of plant species names
# library("rWCVPdata") # According data for standardization
library("ape") # Analyses of phylogenetics and evolution
library("picante") # Phylogenies and ecology
library("pez") # Phylogenies and ecology
library("psych") # basic stats stuff
library("ggplot2") # for plotting
library("cowplot") # for multi-panel plots
library("GGally") # pairs plot
library("RColorBrewer") # colour palettes
library("bioregion") # bioregionalization package used to switch data formats
load multi-tree object into R
Data from Faurby, S., Eiserhardt, W.L., Baker, W.J. & Svenning,
J.-C. (2016).
An all-evidence species-level supertree for the palms (Arecaceae).
Molecular Phylogenetics and Evolution, 100, 57-69.
palmtrees <- read.nexus("data/Phylogeny_ConservativeCon_Checklist/Phylogeny_ConservativeCon_Checklist.nex")
palmtrees
## 1000 phylogenetic trees
palmtrees[[1]]
##
## Phylogenetic tree with 2539 tips and 2538 internal nodes.
##
## Tip labels:
## Nypa_fruticans, Sabal_mauritiiformis, Sabal_pumos, Sabal_domingensis, Sabal_minor, Sabal_palmetto, ...
##
## Rooted; includes branch length(s).
palmtree <- palmtrees[[1]]
str(palmtree)
## List of 4
## $ edge : int [1:5076, 1:2] 2540 2541 2541 2542 2543 2544 2545 2546 2546 2547 ...
## $ edge.length: num [1:5076] 8.89 95.72 9.44 9.78 12.25 ...
## $ Nnode : int 2538
## $ tip.label : chr [1:2539] "Nypa_fruticans" "Sabal_mauritiiformis" "Sabal_pumos" "Sabal_domingensis" ...
## - attr(*, "class")= chr "phylo"
## - attr(*, "order")= chr "cladewise"
Explanation of structure of a phylogeny in R
edge: a two-column matrix where each row represents a
branch (or edge) of the tree. The nodes and the tips are symbolized with
integers. The n tips are numbered from 1 to n, and the m (internal)
nodes from n+1 to n+m (the root being n + 1). For each row, the first
column gives the ancestor.
edge.length (optional): A numeric vector giving the lengths of the branches given by edge.
tip.label: A vector of mode character giving the labels of the tips. The order of these labels corresponds to the integers 1 to n in edge.
Nnode: An integer value giving the number of nodes in the tree (m).
node.label (optional): A vector of mode character giving the labels of the nodes (ordered in the same way as the tip.label).
root.edge (optional): A numeric value giving the length of the branch at the root if it exists.
# Number of nodes (m) (n is the number of tips)
palmtree$Nnode
## [1] 2538
# Tip labels
head(palmtree$tip.label)
## [1] "Nypa_fruticans" "Sabal_mauritiiformis" "Sabal_pumos"
## [4] "Sabal_domingensis" "Sabal_minor" "Sabal_palmetto"
length(palmtree$tip.label)
## [1] 2539
# Edges
head(palmtree$edge)
## [,1] [,2]
## [1,] 2540 2541
## [2,] 2541 1
## [3,] 2541 2542
## [4,] 2542 2543
## [5,] 2543 2544
## [6,] 2544 2545
range(palmtree$edge[, 1]) # ranges from n+1 to n+m (length(palmtree$tip.label) + palmtree$Nnode)
## [1] 2540 5077
is.rooted(palmtree) # does the phylogeny have a root? i.e. can the most basal ancestor be identified?
## [1] TRUE
We can also plot the phylogenetic tree.
plot(palmtree, type = "fan", cex = 0.3,
edge.color = "gray70", tip.color = "#ef8a62",
main = paste("Phylogenetic tree for", palmtree$Nnode, "palm species"))
Data from Kreft, H., Sommer, J.H. & Barthlott, W. (2006).
The significance of geographic range size for spatial diversity
patterns in Neotropical palms. Ecography, 29, 21-30.
species <- read.csv("data/palms_species_per_gridcell.csv",
sep = ",", stringsAsFactors = FALSE)
head(species)
## grid_id spp_Id GENUS EPITHET
## 1 13735 550 Ammandra natalia
## 2 13736 550 Ammandra natalia
## 3 13737 550 Ammandra natalia
## 4 13738 550 Ammandra natalia
## 5 13910 550 Ammandra natalia
## 6 13911 550 Ammandra natalia
length(unique(species$grid_id)) # number of grid cells
## [1] 6638
Add a column for the full species name with “_” as separator: “species”.
species$species <- paste(species$GENUS, species$EPITHET, sep = "_")
species$species <- paste0(species$GENUS, "_", species$EPITHET)
unique(species$species)[1:5]; head(species)
## [1] "Ammandra_natalia" "Ammandra_decasperma"
## [3] "Ammandra_dasyneura" "Phytelephas_tumacana"
## [5] "Phytelephas_tenuicaulis"
## grid_id spp_Id GENUS EPITHET species
## 1 13735 550 Ammandra natalia Ammandra_natalia
## 2 13736 550 Ammandra natalia Ammandra_natalia
## 3 13737 550 Ammandra natalia Ammandra_natalia
## 4 13738 550 Ammandra natalia Ammandra_natalia
## 5 13910 550 Ammandra natalia Ammandra_natalia
## 6 13911 550 Ammandra natalia Ammandra_natalia
How many species in the palm distribution dataset are missing from
the phylogeny?
length(unique(species$species))
## [1] 550
table(unique(species$species) %in% palmtree$tip.label) # need for unique species
##
## FALSE TRUE
## 80 470
# length(unique(species$species[which(!species$species %in% palmtree$tip.label)]))
# sum(!unique(species$species) %in% palmtree$tip.label)
Which species are missing?
# first five palm species
unique(species$species[which(!species$species %in% palmtree$tip.label)])[1:5]
## [1] "Ammandra_natalia" "Ammandra_dasyneura" "Geonoma_weberbaueri"
## [4] "Geonoma_rubescens" "Geonoma_polyandra"
# Species that are NOT in the phylogeny
unique(species$species)[!(unique(species$species) %in% palmtree$tip.label)]
## [1] "Ammandra_natalia" "Ammandra_dasyneura"
## [3] "Geonoma_weberbaueri" "Geonoma_rubescens"
## [5] "Geonoma_polyandra" "Geonoma_poeppiginana"
## [7] "Geonoma_paraguensis" "Geonoma_myriantha"
## [9] "Geonoma_longevaginata" "Geonoma_longepedunculata"
## [11] "Geonoma_linearis" "Geonoma_jussieuana"
## [13] "Geonoma_gastoniana" "Geonoma_gamiova"
## [15] "Geonoma_densa" "Geonoma_brevispatha"
## [17] "Geonoma_arundinacea" "Geonoma_appuniana"
## [19] "Calyptrogyne_kunaria" "Calyptrogyne_condesata"
## [21] "Astrocaryum_gynacanthum" "Desmoncus_phoenicocarpus"
## [23] "Desmoncus_cirrhiferus" "Bactris_trailiana"
## [25] "Bactris_macana" "Bactris_horrispatha"
## [27] "Bactris_hondurensis" "Bactris_grayumi"
## [29] "Bactris_corrossilla" "Aiphanes_leidostachys"
## [31] "Aiphanes_aculeata" "Gastrococcus_crispa"
## [33] "Attalea_septuagenta" "Attalea_luetzelbergii"
## [35] "Polyandrococcus_caudescens" "Syagrus_shmithii"
## [37] "Syagrus_leptospatha" "Hypospathe_altissima"
## [39] "Hypospathe_macrorachis" "Hypospathe_elegans"
## [41] "Prestoea_roseospadix" "Euterpe_longebracteata"
## [43] "Leopoldinia_major" "Wettinia_angusta"
## [45] "Wettinia_aquatorialis" "Socratea_hecantonandra"
## [47] "Iriartella_stenocarpua" "Dictyocarpum_ptarianum"
## [49] "Dictyocarpum_lamarckianum" "Dictyocarpum_fuscum"
## [51] "Chamaedorea_vocanensis" "Chamaedorea_tuerkheimii"
## [53] "Chamaedorea_sullivaniorum" "Chamaedorea_selvae"
## [55] "Chamaedorea_orephila" "Chamaedorea_keeleriorum"
## [57] "Chamaedorea_hopperiana" "Chamaedorea_ernesti-angustii"
## [59] "Ceroxylon_werberbaueri" "Sabal_miamiensis"
## [61] "Sabal_guatemalensis" "Sabal_gretheriae"
## [63] "Brahea_nitida" "Acoelorraphe_wrightii"
## [65] "Coplothrinax_wrightii" "Coplothrinax_cookii"
## [67] "Coccothrinax_hiorami" "Thrinax_rivularis"
## [69] "Thrinax_morrisii" "Thrinax_ekmaniana"
## [71] "Thrinax_compacta" "Crysophila_williamsii"
## [73] "Crysophila_warscewiczii" "Crysophila_stauracantha"
## [75] "Crysophila_nana" "Crysophila_macrocarpa"
## [77] "Crysophila_kalbreyeri" "Crysophila_guagara"
## [79] "Crysophila_grayumii" "Crysophila_cookii"
# dplyr
setdiff(species$species, palmtree$tip.label)
## [1] "Ammandra_natalia" "Ammandra_dasyneura"
## [3] "Geonoma_weberbaueri" "Geonoma_rubescens"
## [5] "Geonoma_polyandra" "Geonoma_poeppiginana"
## [7] "Geonoma_paraguensis" "Geonoma_myriantha"
## [9] "Geonoma_longevaginata" "Geonoma_longepedunculata"
## [11] "Geonoma_linearis" "Geonoma_jussieuana"
## [13] "Geonoma_gastoniana" "Geonoma_gamiova"
## [15] "Geonoma_densa" "Geonoma_brevispatha"
## [17] "Geonoma_arundinacea" "Geonoma_appuniana"
## [19] "Calyptrogyne_kunaria" "Calyptrogyne_condesata"
## [21] "Astrocaryum_gynacanthum" "Desmoncus_phoenicocarpus"
## [23] "Desmoncus_cirrhiferus" "Bactris_trailiana"
## [25] "Bactris_macana" "Bactris_horrispatha"
## [27] "Bactris_hondurensis" "Bactris_grayumi"
## [29] "Bactris_corrossilla" "Aiphanes_leidostachys"
## [31] "Aiphanes_aculeata" "Gastrococcus_crispa"
## [33] "Attalea_septuagenta" "Attalea_luetzelbergii"
## [35] "Polyandrococcus_caudescens" "Syagrus_shmithii"
## [37] "Syagrus_leptospatha" "Hypospathe_altissima"
## [39] "Hypospathe_macrorachis" "Hypospathe_elegans"
## [41] "Prestoea_roseospadix" "Euterpe_longebracteata"
## [43] "Leopoldinia_major" "Wettinia_angusta"
## [45] "Wettinia_aquatorialis" "Socratea_hecantonandra"
## [47] "Iriartella_stenocarpua" "Dictyocarpum_ptarianum"
## [49] "Dictyocarpum_lamarckianum" "Dictyocarpum_fuscum"
## [51] "Chamaedorea_vocanensis" "Chamaedorea_tuerkheimii"
## [53] "Chamaedorea_sullivaniorum" "Chamaedorea_selvae"
## [55] "Chamaedorea_orephila" "Chamaedorea_keeleriorum"
## [57] "Chamaedorea_hopperiana" "Chamaedorea_ernesti-angustii"
## [59] "Ceroxylon_werberbaueri" "Sabal_miamiensis"
## [61] "Sabal_guatemalensis" "Sabal_gretheriae"
## [63] "Brahea_nitida" "Acoelorraphe_wrightii"
## [65] "Coplothrinax_wrightii" "Coplothrinax_cookii"
## [67] "Coccothrinax_hiorami" "Thrinax_rivularis"
## [69] "Thrinax_morrisii" "Thrinax_ekmaniana"
## [71] "Thrinax_compacta" "Crysophila_williamsii"
## [73] "Crysophila_warscewiczii" "Crysophila_stauracantha"
## [75] "Crysophila_nana" "Crysophila_macrocarpa"
## [77] "Crysophila_kalbreyeri" "Crysophila_guagara"
## [79] "Crysophila_grayumii" "Crysophila_cookii"
Missing species from the phylogeny may be synonyms of accepted
species names, and are therefore undetected in the phylogenetic
tree.
Let’s standardize the missing species names.
Write species missing from phylogeny into a vector
specmissing <- unique(species[which(!species$species %in% palmtree$tip.label), c(2:5)])
head(specmissing)
## spp_Id GENUS EPITHET species
## 1 550 Ammandra natalia Ammandra_natalia
## 173 548 Ammandra dasyneura Ammandra_dasyneura
## 808 541 Geonoma weberbaueri Geonoma_weberbaueri
## 3948 529 Geonoma rubescens Geonoma_rubescens
## 4035 528 Geonoma polyandra Geonoma_polyandra
## 4249 526 Geonoma poeppiginana Geonoma_poeppiginana
nrow(specmissing)
## [1] 80
Match palm species names to those in World Checklist of Vascular
Plants.
It is always important to check whether you have access to an updated
taxonomy. We rely on the TNRS
R package.
tnrs_names <- TNRS(taxonomic_names = specmissing$species)
# Alternative with rWCVP
# taxstand <- wcvp_match_names(specmissing,
# name_col = "species",
# id_col = "spp_Id")
# head(taxstand)
Replace old names by new names
tnrs_names$Accepted_species <- gsub(" ", "_", tnrs_names$Accepted_species)
# update the combined species name
for(i in 1:nrow(tnrs_names)){
species[which(species$species == tnrs_names[i, "Name_submitted"]), "species"] <-
tnrs_names[i, "Accepted_species"]
}
Check names that are not in the phylogeny
specmissing <- unique(species$species[which(!species$species %in% palmtree$tip.label)])
specmissing
## [1] "Geonoma_longepedunculata" ""
## [3] "Leopoldinia_major" "Acoelorraphe_wrightii"
length(specmissing)
## [1] 4
Instead of 80 missing species, we only have 4 after taxonomic
harmonization.
Remaining missing species would have to be looked up and added
manually to the phylogeny by an expert.
If missing species cannot be added reliably, we need to remove them
from the dataset for analysis.
# Removal of the species not in the phylogeny
length(unique(species$species))
## [1] 527
species <- species[which(!species$species %in% specmissing), ]
length(unique(species$species))
## [1] 523
Read in polygon shapefile
# gridcells corresponding to palm distribution data and americas coastline
grid <- st_read("data/30min_grid_select50/30min_grid_select50%.shp")
## Reading layer `30min_grid_select50%' from data source
## `C:\Users\Pierre\Work\classes\macroecology\class_content\Gitlab\mcmmb\data\30min_grid_select50\30min_grid_select50%.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 6038 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -116.0007 ymin: -35 xmax: -35.00066 ymax: 36.5
## CRS: NA
americas <- st_read("data/americas/americas.shp")
## Reading layer `americas' from data source
## `C:\Users\Pierre\Work\classes\macroecology\class_content\Gitlab\mcmmb\data\americas\americas.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.7158 ymin: -55.91972 xmax: -29.84 ymax: 49.37666
## CRS: NA
# remember to define the Coordinate Reference System (CRS)
st_crs(grid) <- 4326 # "+proj=longlat +ellps=WGS84 +no_defs"
st_crs(americas) <- 4326 # "+proj=longlat +ellps=WGS84 +no_defs"
# Rename ID column for matching
names(grid)[1] <- "grid_id"
We will now convert the grid shapefile to a coarser resolution
to have fewer grid cells.
To do so, we first create an empty grid of a 2-degree resolution.
grid_2degrees <- st_make_grid(x = grid, cellsize = 2)%>%
st_sf() %>%
mutate(ID = row_number()) # add a unique ID to each grid cell
As we can see on the following plot, the aggregated grid has been made after the extent of the grid. We therefore mask it to remove empty cells:
grid_union <- st_union(grid)
sf_use_s2(FALSE)
## Spherical geometry (s2) switched off
grid_2degrees <- st_intersection(grid_2degrees, grid_union)
## although coordinates are longitude/latitude, st_intersection assumes that they
## are planar
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
ggplot() +
geom_sf(data = grid_2degrees, fill = "white", color = "black") +
geom_sf(data = grid, fill = NA, color = "orange")+
geom_sf(data = grid_2degrees, fill = NA, color = "black", linewidth = 1) +
scale_x_continuous(limits = c(-80, -60)) +
scale_y_continuous(limits = c(20, 5))
We now make a spatial join between the original grid and the coarser
one. We have to specify largest = TRUE
in the function
st_join()
to only keep the coarser cell with the bigger
overlap.
grid <- st_join(grid, grid_2degrees, largest = TRUE)
## although coordinates are longitude/latitude, st_intersection assumes that they
## are planar
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
species <- inner_join(species, st_drop_geometry(grid), by = "grid_id")
head(species)
## grid_id spp_Id GENUS EPITHET species AREA PORTION ID
## 1 13735 550 Ammandra natalia Aphandra_natalia 0.25 100 675
## 2 13736 550 Ammandra natalia Aphandra_natalia 0.25 100 676
## 3 13737 550 Ammandra natalia Aphandra_natalia 0.25 100 676
## 4 13738 550 Ammandra natalia Aphandra_natalia 0.25 100 676
## 5 13910 550 Ammandra natalia Aphandra_natalia 0.25 100 675
## 6 13911 550 Ammandra natalia Aphandra_natalia 0.25 100 676
We can now remove species duplicates per new ID.
species_coarse <- species %>%
group_by(ID) %>%
distinct(species, .keep_all = TRUE) %>%
ungroup() %>%
dplyr::select(-grid_id, -spp_Id, -GENUS, -EPITHET, -AREA, -PORTION)
Alternative way: manual aggregation with a for-loop and the grid
centroids
Extract the grid centroids
# convert into points shapefile
grid_centroids <- st_centroid(grid)
ggplot() +
geom_sf(data = grid, fill = NA) +
geom_sf(data = grid_centroids, shape = 20) +
geom_sf(data = americas, fill = NA, color = "black") +
theme_void() +
scale_x_continuous(limits = c(-80, -60)) +
scale_y_continuous(limits = c(20, 5))
Add coordinates to SpatialPolygonsDataframe
# Add latitude and longitude of grid centroids as extra columns
head(grid)
grid <- cbind(grid, st_coordinates(grid_centroids))
names(grid)[4:5] <- c("longitude", "latitude")
head(grid)
Make new IDs for neighboring cells
Two steps: i) we build an empty matrix with the new IDs, ii) with a
for-loop
, we add the new IDs into the grid
object.
range(grid$longitude)
range(grid$latitude)
longitude <- seq(-116, -34, by = 2)
latitude <- seq(-35, 35, by = 2)
new_ID_matrix <- matrix(c(1:(length(longitude)*length(latitude))),
nrow = length(longitude), ncol = length(latitude),
dimnames = list(longitude, latitude), byrow = TRUE)
dim(new_ID_matrix)
new_ID_matrix[1:5, 1:5]
# Filling
grid$new_ID <- NA
for(i in 1:nrow(grid)){
grid$new_ID[i] <- new_ID_matrix[which(longitude < grid$longitude[i] &
(longitude + 2) > grid$longitude[i]),
which(latitude < grid$latitude[i] &
(latitude + 2) > grid$latitude[i])]
}
head(grid)
Join neighboring cells based on new IDs & aggregate species distribution data using the new grid
grid_2degrees <- group_by(grid, new_ID)
grid_2degrees <- summarise(grid_2degrees, do_union = TRUE, is_coverage = TRUE)
head(grid_2degrees)
ggplot(grid_2degrees) +
geom_sf(fill = "gray90") +
geom_sf(data = americas, fill = NA, color = "black") +
theme_void() + ylim(-40, 40)
# Add new ID also to species data.frame
species <- inner_join(species, st_drop_geometry(grid), by = "grid_id")
head(species)
Prune phylogeny to exclude species not in palm distribution
data
Make sure that the same number of species in your data set are in
your phylogeny
palmtree_pruned <- drop.tip(palmtree,
palmtree$tip.label[which(!palmtree$tip.label %in% unique(species$species))])
length(palmtree$tip.label)
## [1] 2539
length(palmtree_pruned$tip.label)
## [1] 518
length(unique(species$species))
## [1] 518
# write.nexus(palmtree_pruned, file = "data/palmtree_pruned.nex")
Convert long table to species by grid-cell table (a community matrix)
library("bioregion")
species_grid <- net_to_mat(species_coarse[, c("ID", "species")])
# Alternative without the package bioregion
# species <- unique(species[, c("new_ID", "species")])
# species_grid <- as.data.frame.matrix(table(species_coarse))
# species_grid <- as.matrix(species_grid)
dim(species_grid)
## [1] 479 518
species_grid[1:5, 1:5]
## Aphandra_natalia Ammandra_decasperma Phytelephas_tumacana
## 675 1 0 0
## 676 1 0 0
## 677 1 0 0
## 634 1 0 0
## 635 1 0 0
## Phytelephas_tenuicaulis Phytelephas_seemannii
## 675 1 0
## 676 1 0
## 677 1 0
## 634 1 0
## 635 1 0
# write.csv(species_grid, file = "data/palms_specsxsites_phylo.csv")
# if you have a hard time reading in the large palm phylogeny or
# in case you got stuck above
palmtree_pruned <- read.nexus("data/palmtree_pruned.nex")
species_grid <- as.matrix(read.csv("data/palms_specsxsites_phylo.csv", row.names = 1))
Calculate phylogenetic diversity
We here calculate the Faith’s PD (phylogenetic diversity index),
which can be written like this:
PD=∑branchesl(b) with l the length of a
branch b.
# ?pd
pd_palms <- pd(species_grid, palmtree_pruned)
dim(pd_palms) # one value per grid cell
## [1] 479 2
head(pd_palms)
## PD SR
## 675 1575.824 69
## 676 1465.944 60
## 677 1524.880 62
## 634 1467.002 61
## 635 1525.790 62
## 636 1642.958 71
# Adding the ID of grid cells
pd_palms <- data.frame(ID = as.integer(row.names(species_grid)),
pd_palms)
head(pd_palms)
## ID PD SR
## 675 675 1575.824 69
## 676 676 1465.944 60
## 677 677 1524.880 62
## 634 634 1467.002 61
## 635 635 1525.790 62
## 636 636 1642.958 71
pd()
calculates Faith’s PD and a corrected version of it
being the residuals of a regression with total abundance or species
richness per plot. Let’s look into this:
pd_model <- lm(PD ~ SR, data = pd_palms)
pd_palms$residuals <- residuals(pd_model)
head(pd_palms)
## ID PD SR residuals
## 675 675 1575.824 69 -29.973347
## 676 676 1465.944 60 44.204392
## 677 677 1524.880 62 62.238817
## 634 634 1467.002 61 24.812397
## 635 635 1525.790 62 63.148715
## 636 636 1642.958 71 -3.741623
plot_grid(
nrow = 1, ncol = 2,
ggplot(pd_palms, aes(SR, PD)) +
geom_point(color = "black", alpha = 0.3) +
stat_smooth(method = "lm", formula = y ~ x, color = "red") +
labs(x = "Species_number", y = "Faith's PD") +
theme_bw(),
ggplot(pd_palms, aes(SR, residuals)) +
geom_point(color = "black", alpha = 0.3) +
labs(x = "Species_number", y = "Faith's PD residuals") +
theme_bw()
)
Calculate generic phylogenetic metrics (absolute and standardized)
Phylogenetic diversity metrics are standardized by first using a null
model to generate expected values.
The principle of null models is illustrated in the following figure:
Then, the observed value is compared to the null, usually using
standardized effect sizes (SES):
SES=(observed−mean.null)/sd.null where mean.null is the
mean of the null values and sd.null is the standard deviation
of the null values.
Null models can also apply to other phylogenetic diversity indices like the Mean Pairwise Distance (MPD) and the Mean Nearest Taxon Distance (MNTD).
MPD consists in calculating all the pairwise distances between
species in a grid cell and then takes the mean.
MNTD calculates the
mean distance separating each species in the grid cell from its closest
relative. These indices are described in Webb el al. (2002).
c.data <- comparative.comm(palmtree_pruned , species_grid)
pd.null <- generic.null(c.data, c(.pd, .mpd, .mntd),
null.model = "richness",
comp.fun = .ses, permute = 100)
colnames(pd.null) <- c("Faith_PD", "Corrected_FaithPD", "MPD", "MNTD")
rownames(pd.null) <- sites(c.data)
pd.null <- data.frame(pd.null)
pd.null$ID <- as.numeric(rownames(pd.null))
dim(pd.null); head(pd.null)
## [1] 479 21
## Faith_PD.observed Corrected_FaithPD.observed MPD.observed MNTD.observed
## 1000 1161.9594 210.59122 126.3598 44.62270
## 1001 1020.6110 253.30102 123.3111 51.51361
## 1034 475.7734 97.03081 141.8437 64.31507
## 1035 543.9522 42.50417 122.3554 44.78403
## 1036 862.3636 13.25003 109.3301 34.82029
## 1037 1101.8782 -53.99906 105.3146 28.25024
## Faith_PD.null.mean Corrected_FaithPD.null.mean MPD.null.mean
## 1000 1234.5055 84.13755 137.9975
## 1001 1027.9350 95.14096 136.8423
## 1034 506.7864 33.31505 136.8407
## 1035 691.9129 73.39233 135.1412
## 1036 1109.5848 80.09129 136.1568
## 1037 1411.6192 19.50242 136.1205
## MNTD.null.mean Faith_PD.SE Corrected_FaithPD.SE MPD.SE MNTD.SE
## 1000 40.15389 8.440552 8.420466 0.5828615 0.4672596
## 1001 44.45246 8.624029 8.571403 0.6379790 0.6467057
## 1034 77.77521 6.052392 6.004969 1.1561112 1.8083958
## 1035 58.36659 6.663936 6.705964 0.8910013 1.0680532
## 1036 42.40147 8.190609 8.110140 0.5391931 0.5823474
## 1037 36.05080 10.138888 10.075718 0.4841846 0.4767682
## Faith_PD.SES Corrected_FaithPD.SES MPD.SES MNTD.SES Faith_PD.rank
## 1000 -8.5949529 15.017420 -19.96651 9.563858 0.20
## 1001 -0.8492592 18.452062 -21.20949 10.918635 0.50
## 1034 -5.1240790 10.610507 4.32742 -7.443134 0.33
## 1035 -22.2031939 -4.606073 -14.34994 -12.717120 0.02
## 1036 -30.1834992 -8.241690 -49.75346 -13.018307 0.01
## 1037 -30.5497980 -7.294912 -63.62434 -16.361326 0.01
## Corrected_FaithPD.rank MPD.rank MNTD.rank ID
## 1000 0.94 0.05 0.81 1000
## 1001 1.00 0.03 0.87 1001
## 1034 0.86 0.66 0.21 1034
## 1035 0.34 0.09 0.12 1035
## 1036 0.20 0.01 0.12 1036
## 1037 0.26 0.01 0.06 1037
# select columns of interest for final data set
pd.out <- pd.null[, c("ID", "Faith_PD.observed", "Faith_PD.SES",
"MPD.observed", "MPD.SES", "MNTD.observed", "MNTD.SES")]
head(pd.out)
## ID Faith_PD.observed Faith_PD.SES MPD.observed MPD.SES MNTD.observed
## 1000 1000 1161.9594 -8.5949529 126.3598 -19.96651 44.62270
## 1001 1001 1020.6110 -0.8492592 123.3111 -21.20949 51.51361
## 1034 1034 475.7734 -5.1240790 141.8437 4.32742 64.31507
## 1035 1035 543.9522 -22.2031939 122.3554 -14.34994 44.78403
## 1036 1036 862.3636 -30.1834992 109.3301 -49.75346 34.82029
## 1037 1037 1101.8782 -30.5497980 105.3146 -63.62434 28.25024
## MNTD.SES
## 1000 9.563858
## 1001 10.918635
## 1034 -7.443134
## 1035 -12.717120
## 1036 -13.018307
## 1037 -16.361326
# change NAs to zeroes; zero phylogenetic diversity if grid has only 1 spp!
pd.out[is.na(pd.out)] <- 0
head(pd.out)
## ID Faith_PD.observed Faith_PD.SES MPD.observed MPD.SES MNTD.observed
## 1000 1000 1161.9594 -8.5949529 126.3598 -19.96651 44.62270
## 1001 1001 1020.6110 -0.8492592 123.3111 -21.20949 51.51361
## 1034 1034 475.7734 -5.1240790 141.8437 4.32742 64.31507
## 1035 1035 543.9522 -22.2031939 122.3554 -14.34994 44.78403
## 1036 1036 862.3636 -30.1834992 109.3301 -49.75346 34.82029
## 1037 1037 1101.8782 -30.5497980 105.3146 -63.62434 28.25024
## MNTD.SES
## 1000 9.563858
## 1001 10.918635
## 1034 -7.443134
## 1035 -12.717120
## 1036 -13.018307
## 1037 -16.361326
head(pd_palms)
## ID PD SR residuals
## 675 675 1575.824 69 -29.973347
## 676 676 1465.944 60 44.204392
## 677 677 1524.880 62 62.238817
## 634 634 1467.002 61 24.812397
## 635 635 1525.790 62 63.148715
## 636 636 1642.958 71 -3.741623
palmdiversity <- left_join(pd_palms, pd.out, by = "ID")
head(palmdiversity)
## ID PD SR residuals Faith_PD.observed Faith_PD.SES MPD.observed
## 1 675 1575.824 69 -29.973347 1575.824 -21.87064 125.8718
## 2 676 1465.944 60 44.204392 1465.944 -16.84119 126.7775
## 3 677 1524.880 62 62.238817 1524.880 -14.66500 127.4341
## 4 634 1467.002 61 24.812397 1467.002 -19.34955 130.6003
## 5 635 1525.790 62 63.148715 1525.790 -13.95727 129.7774
## 6 636 1642.958 71 -3.741623 1642.958 -15.60304 128.3756
## MPD.SES MNTD.observed MNTD.SES
## 1 -35.51768 26.52567 -11.484193
## 2 -27.25207 27.22376 -13.491031
## 3 -24.40306 26.70619 -15.882595
## 4 -20.47196 28.74301 -9.606953
## 5 -20.87424 28.67301 -7.675902
## 6 -24.35456 25.54811 -15.772418
# Plot
plot_grid(
nrow = 1, ncol = 2,
ggplot(palmdiversity, aes(SR, Faith_PD.SES)) +
geom_point(color = "black", alpha = 0.3) +
stat_smooth(method = "lm", formula = y ~ x, color = "red") +
labs(x = "Species_number", y = "Faith's PD standardized") +
theme_bw(),
ggplot(palmdiversity, aes(Faith_PD.observed, Faith_PD.SES)) +
geom_point(color = "black", alpha = 0.3) +
stat_smooth(method = "lm", formula = y ~ x, color = "red") +
labs(x = "Faith's PD observed", y = "Faith's PD standardized") +
theme_bw()
)
Plot correlations among the diversity metrics in the palm diversity
data.frame (use ggpairs()
).
ggpairs(palmdiversity[, c(3:10)],
upper = list(continuous = wrap(ggally_cor, digits = 1))) +
theme_bw()
# psych::pairs.panels(palmdiversity[, c(3:10)], method = "pearson",
# density = FALSE, ellipses = FALSE, hist.col = "white")
Join the palmdiversity
data.frame to the grid_2degrees
shapefile to be able to link the diversity metrics and the poylgons for
plotting.
head(grid_2degrees); head(palmdiversity)
## Simple feature collection with 6 features and 1 field
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: -72.00066 ymin: -35 xmax: -58.00066 ymax: -33
## Geodetic CRS: WGS 84
## ID geometry
## 22 22 MULTILINESTRING ((-72.00066...
## 23 23 POLYGON ((-71.50066 -35, -7...
## 26 26 GEOMETRYCOLLECTION (POLYGON...
## 27 27 POLYGON ((-64.00066 -33, -6...
## 28 28 POLYGON ((-62.00066 -33, -6...
## 29 29 MULTIPOLYGON (((-58.00066 -...
## ID PD SR residuals Faith_PD.observed Faith_PD.SES MPD.observed
## 1 675 1575.824 69 -29.973347 1575.824 -21.87064 125.8718
## 2 676 1465.944 60 44.204392 1465.944 -16.84119 126.7775
## 3 677 1524.880 62 62.238817 1524.880 -14.66500 127.4341
## 4 634 1467.002 61 24.812397 1467.002 -19.34955 130.6003
## 5 635 1525.790 62 63.148715 1525.790 -13.95727 129.7774
## 6 636 1642.958 71 -3.741623 1642.958 -15.60304 128.3756
## MPD.SES MNTD.observed MNTD.SES
## 1 -35.51768 26.52567 -11.484193
## 2 -27.25207 27.22376 -13.491031
## 3 -24.40306 26.70619 -15.882595
## 4 -20.47196 28.74301 -9.606953
## 5 -20.87424 28.67301 -7.675902
## 6 -24.35456 25.54811 -15.772418
grid_2degrees <- left_join(grid_2degrees, palmdiversity, by = "ID")
head(grid_2degrees)
## Simple feature collection with 6 features and 10 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: -72.00066 ymin: -35 xmax: -58.00066 ymax: -33
## Geodetic CRS: WGS 84
## ID PD SR residuals Faith_PD.observed Faith_PD.SES MPD.observed MPD.SES
## 1 22 NA NA NA NA NA NA NA
## 2 23 104.614 1 -110.5213 104.614 1.461737 0 0
## 3 26 104.614 1 -110.5213 104.614 4.109849 0 0
## 4 27 104.614 1 -110.5213 104.614 5.236266 0 0
## 5 28 104.614 1 -110.5213 104.614 5.873155 0 0
## 6 29 104.614 1 -110.5213 104.614 6.043440 0 0
## MNTD.observed MNTD.SES geometry
## 1 NA NA MULTILINESTRING ((-72.00066...
## 2 0 0 POLYGON ((-71.50066 -35, -7...
## 3 0 0 GEOMETRYCOLLECTION (POLYGON...
## 4 0 0 POLYGON ((-64.00066 -33, -6...
## 5 0 0 POLYGON ((-62.00066 -33, -6...
## 6 0 0 MULTIPOLYGON (((-58.00066 -...
Plot maps with colours according to the diversity metrics.
Species richness, PD, MPD & MNTD (absolute)
plot_grid(
nrow = 2, ncol = 2,
ggplot(grid_2degrees[complete.cases(grid_2degrees$SR), ]) +
geom_sf(color = NA, aes(fill = SR)) +
geom_sf(data = americas, fill = NA, color = "black") +
labs(title = "Palm species richness") +
scale_fill_viridis_c("Species\nnumber") +
theme_void() +
theme(plot.title = element_text(margin = margin(0, 0, 10, 0))) +
ylim(-40, 40),
ggplot(grid_2degrees[complete.cases(grid_2degrees$SR), ]) +
geom_sf(color = NA, aes(fill = PD)) +
geom_sf(data = americas, fill = NA, color = "black") +
labs(title = "Faith's Phylogenetic diversity") +
scale_fill_viridis_c("PD (my)") +
theme_void() +
theme(plot.title = element_text(margin = margin(0, 0, 10, 0))) +
ylim(-40, 40),
ggplot(grid_2degrees[complete.cases(grid_2degrees$SR), ]) +
geom_sf(color = NA, aes(fill = MPD.observed)) +
geom_sf(data = americas, fill = NA, color = "black") +
labs(title = "Mean pairwise distance") +
scale_fill_viridis_c("MPD (my)") +
theme_void() +
theme(plot.title = element_text(margin = margin(0, 0, 10, 0))) +
ylim(-40, 40),
ggplot(grid_2degrees[complete.cases(grid_2degrees$SR), ]) +
geom_sf(color = NA, aes(fill = MNTD.observed)) +
geom_sf(data = americas, fill = NA, color = "black") +
labs(title = "Mean nearest taxon distance") +
scale_fill_viridis_c("MNTD (my)") +
theme_void() +
theme(plot.title = element_text(margin = margin(0, 0, 10, 0))) +
ylim(-40, 40)
)
# par(mfrow = c(2, 2), mar = rep(1, 4), oma = rep(0, 4))
#
# my.class.fr <- classIntervals(grid_2degrees@data$SR, n = 10, style = "equal") # bin data into n quantiles
# my.pal <- matlab.like(10)
# my.col.fr <- findColours(my.class.fr, my.pal) # ramp colors based on classInts
#
# plot(grid_2degrees, col = my.col.fr, border = NA, axes = FALSE)
# title(main = "Species_richness", line = -3)
# plot(americas, add = TRUE)
#
# my.class.fr <- classIntervals(grid_2degrees@data$Faith_PD.observed,
# n = 10, style = "equal") # bin data into n quantiles
# my.pal <- matlab.like(10)
# my.col.fr <- findColours(my.class.fr, my.pal) # ramp colors based on classInts
#
# plot(grid_2degrees, col = my.col.fr, border = NA, axes = FALSE)
# title(main = "Faith's PD",line = -3)
# plot(americas, add = TRUE)
#
# my.class.mpd <- classIntervals(grid_2degrees@data$MPD.observed,
# n = 10, style = "equal") # bin data into n quantiles
# my.pal2 <- matlab.like(10)
# my.col.mpd <- findColours(my.class.mpd, my.pal2) # ramp colors based on classInts
#
# plot(grid_2degrees, col = my.col.mpd, border = NA, axes = FALSE)
# title(main = "MPD", line = -3)
# plot(americas, add = TRUE)
#
# my.class.mntd <- classIntervals(grid_2degrees@data$MNTD.observed,
# n = 10, style = "equal") # bin data into n quantiles
# my.pal3 <- matlab.like(10)
# my.col.mntd <- findColours(my.class.mntd, my.pal3) # ramp colors based on classInts
#
# plot(grid_2degrees, col = my.col.mntd, border = NA, axes = FALSE)
# title(main = "MNTD", line = -3)
# plot(americas, add = TRUE)
PD, MPD & MNTD (standardized)
Maps.