Preparation: Please open the RStudio project we
established last week and download all data for the class from Stud-IP
(log in first) and unzip them into your project folder /data/. You can
use getwd()
to get your current working directory which is
your project folder as long as you do not change it. Please install the
following R-packages using install.packages()
if not done
yet:
sf
terra
dplyr
ggplot2
If you want to visualize this tutorial in the viewer inside RStudio (to save space on your screen) run the following chunk of code:
install.packages("rstudioapi") # install an R-package required for this step
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/mcmmb/index.html", destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/mcmmb/Day2.html", destfile = file.path(dir, "Day2.html"))
htmlFile <- file.path(dir, "Day2.html")
rstudioapi::viewer(htmlFile)
Now you can conveniently copy code from the viewer into your script.
We will work with palm species distributed in the Neotropics described inn Kreft, H., Sommer, J.H. & Barthlott, W. (2006) The significance of geographic range size for spatial diversity patterns in Neotropical palms. Ecography, 29, 21-30.
Load data
To import a table into R, we use the function
read.table()
:
species <- read.table("data/palms_species_per_gridcell.csv",
header = TRUE, sep = ",")
species <- read.csv("data/palms_species_per_gridcell.csv")
Inspection
str(species)
## 'data.frame': 140860 obs. of 4 variables:
## $ grid_id: int 13735 13736 13737 13738 13910 13911 13912 13913 13914 14085 ...
## $ spp_Id : int 550 550 550 550 550 550 550 550 550 550 ...
## $ GENUS : chr "Ammandra" "Ammandra" "Ammandra" "Ammandra" ...
## $ EPITHET: chr "natalia" "natalia" "natalia" "natalia" ...
dim(species)
## [1] 140860 4
colnames(species) # rownames(species) for rows
## [1] "grid_id" "spp_Id" "GENUS" "EPITHET"
How many genera do we have?
unique(species$GENUS)
## [1] "Ammandra" "Phytelephas" "Geonoma" "Asterogyne"
## [5] "Calyptrogyne" "Calyptronoma" "Welfia" "Pholidostachys"
## [9] "Astrocaryum" "Desmoncus" "Bactris" "Aiphanes"
## [13] "Gastrococcus" "Acrocomia" "Elaeis" "Barcella"
## [17] "Attalea" "Polyandrococcus" "Allagoptera" "Parajubaea"
## [21] "Lytocaryum" "Syagrus" "Cocos" "Jubaea"
## [25] "Butia" "Roystonea" "Hypospathe" "Oenocarpus"
## [29] "Neonicholsonia" "Prestoea" "Euterpe" "Reinhardtia"
## [33] "Leopoldinia" "Manicaria" "Wettinia" "Socratea"
## [37] "Iriartea" "Iriartella" "Dictyocarpum" "Wendlandiella"
## [41] "Chamaedorea" "Synechanthus" "Gaussia" "Juania"
## [45] "Ceroxylon" "Pseudophoenix" "Lepidocaryum" "Mauritiella"
## [49] "Mauritia" "Raphia" "Sabal" "Washingtonia"
## [53] "Copernicia" "Brahea" "Serenoa" "Acoelorraphe"
## [57] "Coplothrinax" "Rhapidophyllum" "Zombia" "Coccothrinax"
## [61] "Thrinax" "Schippia" "Itaya" "Crysophila"
## [65] "Chelyocarpus" "Trithrinax"
length(unique(species$GENUS)) # or dplyr::n_distinct(species$GENUS)
## [1] 66
Adding a species column to the data.frame
paste()
allows us to combine both character strings of
genus and epithet into a full species name.
species$species <- paste(species$GENUS, species$EPITHET, sep = " ")
How many species do we have?
length(unique(species$species)) # or dplyr::n_distinct(species$species)
## [1] 550
We can make a table with only one entry per species, i.e. we remove duplicates:
species_unique <- species[!duplicated(species$species), ]
# or species_unique <- unique(species[, c("spp_Id", "GENUS", "EPITHET", "species")])
Using table
, we can see how many species there are per
genus:
spec_genera <- table(species_unique$GENUS) # contingency table of the counts of factor levels.
spec_genera
##
## Acoelorraphe Acrocomia Aiphanes Allagoptera Ammandra
## 1 2 22 4 3
## Asterogyne Astrocaryum Attalea Bactris Barcella
## 5 18 29 64 1
## Brahea Butia Calyptrogyne Calyptronoma Ceroxylon
## 9 8 8 3 11
## Chamaedorea Chelyocarpus Coccothrinax Cocos Copernicia
## 77 4 14 1 13
## Coplothrinax Crysophila Desmoncus Dictyocarpum Elaeis
## 2 9 7 3 1
## Euterpe Gastrococcus Gaussia Geonoma Hypospathe
## 7 1 5 51 3
## Iriartea Iriartella Itaya Juania Jubaea
## 1 2 1 1 1
## Leopoldinia Lepidocaryum Lytocaryum Manicaria Mauritia
## 3 1 2 1 2
## Mauritiella Neonicholsonia Oenocarpus Parajubaea Pholidostachys
## 3 1 9 2 4
## Phytelephas Polyandrococcus Prestoea Pseudophoenix Raphia
## 6 1 11 4 1
## Reinhardtia Rhapidophyllum Roystonea Sabal Schippia
## 6 1 9 16 1
## Serenoa Socratea Syagrus Synechanthus Thrinax
## 1 5 30 2 7
## Trithrinax Washingtonia Welfia Wendlandiella Wettinia
## 3 2 1 1 21
## Zombia
## 1
These frequencies can be plotted using a histogram:
library("ggplot2")
# Convert to a data.frame
spec_genera <- data.frame(genus = names(spec_genera),
count = as.numeric(spec_genera))
# Histogram
ggplot(spec_genera, aes(count)) +
geom_histogram(color= "black", fill = "grey90") +
labs(title = "Histogram of species per genus",
x = "Species per genus", y = "Frequency") +
theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Below is an alternative using forcats::fct_infreq
:
ggplot(species, aes(x = forcats::fct_infreq(GENUS))) +
geom_histogram(stat = "count", color= "black", fill = "grey90") +
labs(title = "Histogram of species per genus",
x = "Species per genus", y = "Frequency") +
theme_bw()
## Warning in geom_histogram(stat = "count", color = "black", fill = "grey90"):
## Ignoring unknown parameters: `binwidth`, `bins`, and `pad`
How many species belong to the genus Ammandra?
which(species$GENUS == "Ammandra") # index of all entries of genus "Ammandra"
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
## [145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
## [163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
## [181] 181 182
head(species[which(species$GENUS == "Ammandra"), ]) # first six rows
## grid_id spp_Id GENUS EPITHET species
## 1 13735 550 Ammandra natalia Ammandra natalia
## 2 13736 550 Ammandra natalia Ammandra natalia
## 3 13737 550 Ammandra natalia Ammandra natalia
## 4 13738 550 Ammandra natalia Ammandra natalia
## 5 13910 550 Ammandra natalia Ammandra natalia
## 6 13911 550 Ammandra natalia Ammandra natalia
# What species belong to this genus?
unique(species[which(species$GENUS == "Ammandra"), "species"])
## [1] "Ammandra natalia" "Ammandra decasperma" "Ammandra dasyneura"
length(unique(species[which(species$GENUS == "Ammandra"), "species"]))
## [1] 3
OR
How many species belong to the genus Ammandra or
Washingtonia?
unique(species$species[which(species$GENUS == "Ammandra" |
species$GENUS == "Washingtonia")]) # all species names in the genera "Ammandra" and "Washingtonia"
## [1] "Ammandra natalia" "Ammandra decasperma" "Ammandra dasyneura"
## [4] "Washingtonia robusta" "Washingtonia filifera"
unique(species$species[which(species$GENUS %in% c("Ammandra", "Washingtonia"))])
## [1] "Ammandra natalia" "Ammandra decasperma" "Ammandra dasyneura"
## [4] "Washingtonia robusta" "Washingtonia filifera"
length(unique(species$species[which(species$GENUS %in%
c("Ammandra", "Washingtonia"))]))
## [1] 5
AND
In how many grid cells can we find the species Ammandra
decasperma?
head(species[which(species$GENUS=="Ammandra" &
species$EPITHET == "decasperma"), ]) # all entries of Ammandra decasperma
## grid_id spp_Id GENUS EPITHET species
## 153 10937 549 Ammandra decasperma Ammandra decasperma
## 154 10938 549 Ammandra decasperma Ammandra decasperma
## 155 11112 549 Ammandra decasperma Ammandra decasperma
## 156 11113 549 Ammandra decasperma Ammandra decasperma
## 157 11286 549 Ammandra decasperma Ammandra decasperma
## 158 11287 549 Ammandra decasperma Ammandra decasperma
head(species[which(species$species == "Ammandra decasperma"), ])
## grid_id spp_Id GENUS EPITHET species
## 153 10937 549 Ammandra decasperma Ammandra decasperma
## 154 10938 549 Ammandra decasperma Ammandra decasperma
## 155 11112 549 Ammandra decasperma Ammandra decasperma
## 156 11113 549 Ammandra decasperma Ammandra decasperma
## 157 11286 549 Ammandra decasperma Ammandra decasperma
## 158 11287 549 Ammandra decasperma Ammandra decasperma
nrow(species[which(species$species == "Ammandra decasperma"), ])
## [1] 20
Make a subset of the data.frame that only includes the following 4 species Geonoma pauciflora, Geonoma weberbaueri, Phytelephas tenuicaulis & Mauritia flexuosa.
subset_sp <- c("Geonoma pauciflora", "Geonoma weberbaueri",
"Phytelephas tenuicaulis", "Mauritia flexuosa")
sub_sp <- species[which(species$species %in% subset_sp), ]
According to the World Checklist of Vascular Plants, Geonoma weberbaueri is a synonym of Geonoma undata. Rename it!
unique(sub_sp$species)
## [1] "Phytelephas tenuicaulis" "Geonoma weberbaueri"
## [3] "Geonoma pauciflora" "Mauritia flexuosa"
sub_sp[which(sub_sp$species == "Geonoma weberbaueri"), "species"] <-
"Geonoma undata"
unique(sub_sp$species)
## [1] "Phytelephas tenuicaulis" "Geonoma undata"
## [3] "Geonoma pauciflora" "Mauritia flexuosa"
In how many grid cells does each of these species occur?
table(sub_sp$species)
##
## Geonoma pauciflora Geonoma undata Mauritia flexuosa
## 224 334 2858
## Phytelephas tenuicaulis
## 135
Other operators that can be used are: which()
and
operators: ==
, >=
, >
,
<
, <=
, %in%
,
is.na()
, is.finite()
, !
,
&
, |
We now load the spatial grid, containing the geometries for all the cell IDs.
Loading Geospatial R packages
install.packages("sf")
library("sf")
library("terra")
Load spatial poylgons into R We will use the two shapefiles 30min_grid_select50%.shp and americas.shp
grid <- st_read("data/30min_grid_select50/30min_grid_select50%.shp")
## Reading layer `30min_grid_select50%' from data source
## `C:\Users\Pierre\Work\classes\macroecology\class_content\Gitlab\mcmmb\data\30min_grid_select50\30min_grid_select50%.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 6038 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -116.0007 ymin: -35 xmax: -35.00066 ymax: 36.5
## CRS: NA
americas <- st_read("data/americas/americas.shp")
## Reading layer `americas' from data source
## `C:\Users\Pierre\Work\classes\macroecology\class_content\Gitlab\mcmmb\data\americas\americas.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.7158 ymin: -55.91972 xmax: -29.84 ymax: 49.37666
## CRS: NA
str(grid)
## Classes 'sf' and 'data.frame': 6038 obs. of 4 variables:
## $ ID : num 538 539 540 712 713 714 715 716 885 886 ...
## $ AREA : num 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ...
## $ PORTION : num 100 100 100 100 100 100 100 100 100 100 ...
## $ geometry:sfc_MULTIPOLYGON of length 6038; first list element: List of 1
## ..$ :List of 1
## .. ..$ : num [1:5, 1:2] -114 -115 -115 -114 -114 ...
## ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA
## ..- attr(*, "names")= chr [1:3] "ID" "AREA" "PORTION"
ext(grid)
## SpatExtent : -116.000656, -35.000656, -35, 36.5 (xmin, xmax, ymin, ymax)
class(americas)
## [1] "sf" "data.frame"
ext(americas)
## SpatExtent : -124.715843200684, -29.8400001525879, -55.9197235107422, 49.3766555786133 (xmin, xmax, ymin, ymax)
Access the attribute table
head(grid)
## Simple feature collection with 6 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -115.0007 ymin: 35.5 xmax: -113.0007 ymax: 36.5
## CRS: NA
## ID AREA PORTION geometry
## 1 538 0.25 100 MULTIPOLYGON (((-114.0007 3...
## 2 539 0.25 100 MULTIPOLYGON (((-113.5007 3...
## 3 540 0.25 100 MULTIPOLYGON (((-113.0007 3...
## 4 712 0.25 100 MULTIPOLYGON (((-114.5007 3...
## 5 713 0.25 100 MULTIPOLYGON (((-114.0007 3...
## 6 714 0.25 100 MULTIPOLYGON (((-113.5007 3...
CRS
st_crs(grid)
## Coordinate Reference System: NA
st_crs(americas)
## Coordinate Reference System: NA
CRS is not defined yet for any of them!
st_crs(grid) <- "+proj=longlat +ellps=WGS84 +no_defs"
st_crs(americas) <- "+proj=longlat +ellps=WGS84 +no_defs"
st_crs(grid)
## Coordinate Reference System:
## User input: +proj=longlat +ellps=WGS84 +no_defs
## wkt:
## GEOGCRS["unknown",
## DATUM["Unknown based on WGS 84 ellipsoid",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1],
## ID["EPSG",7030]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8901]],
## CS[ellipsoidal,2],
## AXIS["longitude",east,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433,
## ID["EPSG",9122]]],
## AXIS["latitude",north,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433,
## ID["EPSG",9122]]]]
st_geometry(grid)
## Geometry set for 6038 features
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -116.0007 ymin: -35 xmax: -35.00066 ymax: 36.5
## Geodetic CRS: +proj=longlat +ellps=WGS84 +no_defs
## First 5 geometries:
## MULTIPOLYGON (((-114.0007 36, -114.5007 36, -11...
## MULTIPOLYGON (((-113.5007 36, -114.0007 36, -11...
## MULTIPOLYGON (((-113.0007 36, -113.5007 36, -11...
## MULTIPOLYGON (((-114.5007 35.5, -115.0007 35.5,...
## MULTIPOLYGON (((-114.0007 35.5, -114.5007 35.5,...
Check out http://proj4.org/ or http://epsg.io/ for spatial projections.
We can plot both objects:
ggplot(grid) +
geom_sf() +
geom_sf(data = americas, fill = NA) +
theme_bw()
# Plot with base R
# par(mfrow = c(1, 2))
# plot(st_geometry(grid), border = "darkgrey")
# plot(st_geometry(americas), add=TRUE)
If we zoom in, we will see better the grid cells:
ggplot(grid) +
geom_sf() +
geom_sf(data = americas, fill = NA, color = "black") +
scale_x_continuous(limits = c(-85, -65)) +
scale_y_continuous(limits = c(5, 25)) +
theme_bw()
# plot(st_geometry(grid), border = "darkgrey",
# ylim = c(5, 25), xlim = c(-85, -65))
# plot(st_geometry(americas), add = TRUE)
To perform a join between two tables having a column in common, we
rely on the functions available in the dplyr
library.
library("dplyr")
To make a join between two tables, they need to have a column in common. The grid ID is the column in common in our case, but it is not named the same. Let’s rename one.
colnames(grid)[1] <- "grid_id"
grid_species <- left_join(grid, species, by = c("grid_id"))
subset_sp <- c("Geonoma pauciflora", "Geonoma undata",
"Phytelephas tenuicaulis", "Mauritia flexuosa")
ggplot(grid_species[which(grid_species$species %in% subset_sp), ]) +
geom_sf(data = grid, color = "grey90", fill = "grey90") +
geom_sf(data = americas, fill = NA) +
geom_sf(aes(color = species, fill = species), show.legend = FALSE) +
theme_bw() +
facet_wrap(~species)
Convert long table to species by grid-cell table There are
several options to do this, we can use functions from existing R package
like bioregion
or base R approaches.
library("bioregion")
species <- species[, c("grid_id", "species")]
species_grid <- net_to_mat(net = species)
# Alternative
species_grid <- as.data.frame.matrix(table(species)) # Count species by grid cell combinations
species_grid[1:3, 1:3]
The grid IDs are stored in the rows now, let’s put them as the first column of the wide-format data.frame:
species_grid <- data.frame(grid_id = as.integer(row.names(species_grid)),
species_grid)
species_grid[1:3, 1:3]
## grid_id Ammandra.natalia Ammandra.decasperma
## 13735 13735 1 0
## 13736 13736 1 0
## 13737 13737 1 0
Join palm occurrences and spatial shapefile
grid_0 <- left_join(grid_0, species_grid, by = "grid_id") # Perform a "SQL" left join
grid_0[is.na(grid_0)] <- 0
grid_0[which(grid_0$grid_id %in% c(5328:5332)), 1:5]
## Simple feature collection with 5 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -82.00066 ymin: 22.5 xmax: -79.50066 ymax: 23
## Geodetic CRS: +proj=longlat +ellps=WGS84 +no_defs
## grid_id AREA PORTION Ammandra.natalia Ammandra.decasperma
## 558 5328 0.19638 78.552 0 0
## 559 5329 0.25000 100.000 0 0
## 560 5330 0.24921 99.684 0 0
## 561 5331 0.22011 88.044 0 0
## 562 5332 0.13200 52.800 0 0
## geometry
## 558 MULTIPOLYGON (((-81.50066 2...
## 559 MULTIPOLYGON (((-81.00066 2...
## 560 MULTIPOLYGON (((-80.50066 2...
## 561 MULTIPOLYGON (((-80.3008 23...
## 562 MULTIPOLYGON (((-79.93695 2...
Make distribution maps for the four species
ggplot(grid_0) +
geom_sf(aes(color = ifelse(Geonoma.pauciflora == 1, "forestgreen",
"gray90"),
fill = ifelse(Geonoma.pauciflora == 1, "forestgreen",
"gray90")),
show.legend = FALSE) +
geom_sf(data = americas, fill = NA) +
scale_color_identity() +
scale_fill_identity() +
labs(title = expression(italic("Geonoma pauciflora"))) +
theme_bw()
Below is the equivalent plotting made from base R code:
# One species
plot(st_geometry(grid_0),
col = ifelse(grid_0$Geonoma.undata == 1, "forestgreen", "grey80"),
border=NA, main = expression(italic("Geonoma pauciflora")))
plot(st_geometry(americas), add=TRUE)
Below is the equivalent plotting made from base R code:
# Four species together
plot(grid_0[c("Geonoma.pauciflora", "Geonoma.undata",
"Phytelephas.tenuicaulis", "Mauritia.flexuosa")],
border = NA, pal = c("grey80", "forestgreen"))
To go further on the topic of plots with spatial data, you can have a look at these nice tutorials: http://edrub.in/ARE212/section12.html, https://r-spatial.github.io/sf/articles/sf5.html