Before starting
Install software
Create a new project in RStudio
Do this only once for the entire practical part of the class and once again for your project in the end but not for each session separately
If you want to visualize this tutorial in the viewer inside RStudio (to save space on your screen) run the following chunk of code:
install.packages("rstudioapi") # install an R-package required for this step
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/mcmmb/index.html", destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/mcmmb/Day1.html", destfile = file.path(dir, "Day1.html"))
htmlFile <- file.path(dir, "Day1.html")
rstudioapi::viewer(htmlFile)
Now you can conveniently copy code from the viewer into your script.
vegetation_type | plot | day | rich |
---|---|---|---|
forest | 1 | 1 | 5 |
forest | 2 | 3 | 7 |
forest | 3 | 5 | 11 |
forest | 4 | 7 | 18 |
forest | 5 | 9 | 4 |
shrubland | 1 | 11 | 18 |
shrubland | 2 | 13 | 19 |
shrubland | 3 | 15 | 13 |
shrubland | 4 | 17 | 13 |
shrubland | 5 | 19 | 1 |
grassland | 1 | 21 | 4 |
grassland | 2 | 23 | 4 |
grassland | 3 | 25 | 14 |
grassland | 4 | 27 | 8 |
grassland | 5 | 29 | 15 |
What would be the R code needed to create such a table?
Species richness can be made up of random numbers.
plots <- data.frame(vegetation_type = rep(c("forest", "shrubland", "grassland"),
each = 5),
plot = rep(1:5, 3))
plots$day <- seq(1, 29, by = 2)
set.seed(1)
plots$rich <- round(runif(n = nrow(plots), min = 0, max = 20), 0)
To subset part of the data, we use the operator [], or $ to access a specific column. For example, to get the average species richness in forest plots, we can run:
mean(plots[which(plots$vegetation_type == "forest"), ]$rich)
## [1] 9
We can plot the species richness against the vegetation type, using for example boxplots.
Generally speaking, there are two major ways of plotting in R: Base R and ggplot2. Except for raster data, we will use ggplot2 throughout the tutorials.
library("ggplot2")
ggplot(plots, aes(vegetation_type, rich)) +
geom_boxplot()
# Fine-tuning the boxplot
ggplot(plots, aes(vegetation_type, rich)) +
geom_boxplot(aes(fill = vegetation_type)) +
labs(title = "Basic boxplot",
x = "Vegetation type", y = "Species richness") +
theme_bw() +
theme(legend.position = "bottom")
With base R code, we could do:
boxplot(plots$rich ~ plots$vegetation_type, main = "Basic boxplot",
xlab = "Vegetation type", ylab = "Species richness")
In biogeography and macroecology, we often deal with spatial data. There are different types of spatial data, one of them is shapefiles. Here we are working with the global shapefile of the countries of the world.
The r
library we need to work with shapefiles is
sf
. Make sure to have it installed and to load it in your
session.
# install.packages("sf")
library("sf")
## Warning: package 'sf' was built under R version 4.4.2
## Linking to GEOS 3.12.2, GDAL 3.9.3, PROJ 9.4.1; sf_use_s2() is TRUE
To get the polygons for all countries, we load two packages containing the data.
library("rnaturalearth")
library("rnaturalearthdata")
##
## Attaching package: 'rnaturalearthdata'
## The following object is masked from 'package:rnaturalearth':
##
## countries110
These packages come with datasets, one of the world country shapes and one with coastlines. We can retrieve them like this:
world <- ne_countries(returnclass = "sf")
coastline <- ne_coastline(returnclass = "sf")
As with any object, we should look at its class and structure:
class(world)
## [1] "sf" "data.frame"
class(coastline)
## [1] "sf" "data.frame"
str(world)
str(coastline)
The geometry
component of the object makes it spatial.
This allows us to make a map of its contents, directly using
ggplot2
with the geometry geom_sf()
.
Different geometries are possible (see a list here), and
we can inspect it with the function st_geometry
:
st_geometry(world)
## Geometry set for 177 features
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## First 5 geometries:
## MULTIPOLYGON (((180 -16.06713, 180 -16.55522, 1...
## MULTIPOLYGON (((33.90371 -0.95, 34.07262 -1.059...
## MULTIPOLYGON (((-8.66559 27.65643, -8.665124 27...
## MULTIPOLYGON (((-122.84 49, -122.9742 49.00254,...
## MULTIPOLYGON (((-122.84 49, -120 49, -117.0312 ...
st_geometry(coastline)
## Geometry set for 134 features
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: -180 ymin: -85.60904 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## First 5 geometries:
## LINESTRING (-163.7129 -78.59567, -163.1058 -78....
## LINESTRING (-6.197885 53.86757, -6.032985 53.15...
## LINESTRING (141.0002 -2.600151, 142.7352 -3.289...
## LINESTRING (114.204 4.525874, 114.6 4.900011, 1...
## LINESTRING (-93.61276 74.98, -94.15691 74.59235...
To plot a sf
object with ggplot
, we simply
need to call the geometry geom_sf
:
ggplot(world) +
geom_sf()
With base R plot, we can run:
plot(st_geometry(world))
To add another sf
layer to a plot, we can simply add a
new geometry with a new data. Let’s try plotting the shapes of the
world’s countries with the coastlines on top:
ggplot(world) +
geom_sf(color = "grey50", fill = "grey80") +
geom_sf(data = coastline, color = "black", fill = "blue") # fill useless here
We can plot only a subset of the `world’ object by applying a classic subset to the object. Let’s plot the countries that fall into the continent of North America:
ggplot(world[which(world$continent == "North America"), ]) +
geom_sf()
Based on the columns subregion
, we can also color each
of these polygons based on their subregions:
ggplot(world[which(world$continent == "North America"), ]) +
geom_sf(aes(fill = subregion))
Note that we are using the aesthetic fill
to color the
inside of each polygon. We could also use color
to have
different outlines based on a different variable. For example:
ggplot(world[which(world$continent == "North America"), ]) +
geom_sf(aes(fill = pop_est,
color = subregion), linewidth = 1) # increasing the outline width
We can also label each polygon based on its country name:
ggplot(world[which(world$continent == "North America"), ]) +
geom_sf(aes(color = subregion,
fill = pop_est)) +
geom_sf_label(aes(label = name), size = 2)
## Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
## give correct results for longitude/latitude data
Choosing colors is an important topic, and they are some classic mistakes that should be avoided. First, a fair proportion of people are color blind and do not distinguish between certain hues. Second, some color are perceived better by the human eye, artificially biasing our interpretation of figures. The following figure taken from this page, illustrates nicely some of these issues:
Ocean seafloor age map. (a) The rainbow colour scheme adds randomly (at least) two strong artificial boundaries to the data: One along the red-yellow transition (at 0.4 of the colour bar range) and one along the blue-cyan transition (at 0.7 of the colour bar range). At the same time, it hides the local data gradients in the greenish parts of the colour map making it impossible to spot, for example, the large area of same age just north of the Tonga subduction zone. Moreover, it is unreadable for people with most forms of colour blindness, and when printed in black and white. The perceptually-uniform colour maps (b) lajolla and (c) vik prevent all of these problems, where the latter is so-called zero-centred and can be used to specifically highlight the old (brownish) and young (blueish) parts of the ocean floor with a boundary (white) occurring naturally in the middle of the data range.
So what to do? Instead of picking your own colors, pick a color set
designed by a professional. These color palettes are are perceptually
uniform, work with black and white, accessible to colorblind. Two of
these palettes are RColorBrewer
and viridis
,
which come with their own R packages. You can visualize the Color Brewer
palette on this very nice website: https://colorbrewer2.org, and
viridis is also introduced here.
# install.packages("RColorBrewer")
# install.packages("viridis")
library("RColorBrewer")
library("viridis")
## Loading required package: viridisLite
Both of these palettes can be implemented in a ggplot framework very easily:
# RColorBrewer
# Palette names are visible here: RColorBrewer::display.brewer.all()
ggplot(world) +
geom_sf(aes(color = continent, fill = continent)) +
scale_color_brewer(palette = "Set1") +
scale_fill_brewer(palette = "Set1") +
labs(title = "Set1 palette from ColorBrewer")
# viridis
ggplot(world) +
geom_sf(aes(color = continent, fill = continent)) +
scale_color_viridis_d(option = "magma") +
scale_fill_viridis_d(option = "magma") +
labs(title = "Magma palette from viridis")
To go further on this topic, here are some resources: here and here
One mandatory consideration when dealing with spatial data is the Coordinate Reference System (CRS) to be used.
A CRS is made of a datum and a coordinate system. A datum is a reference system which allows the location of latitudes and longitudes (and heights) to be identified onto the surface of the Earth. The coordinate system is a model of our world or projection. CRS can be geographic, in which case they rely on latitudes and longitures (e.g. unprojected WGS84), or projected (mathematical translation to a 2D surface).
What we need to ensure is that our spatial objects have a CRS
defined. To check if a spatial object has a CRS defined, we can run
st_crs
.
st_crs(world)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
As we can see in the last field, the EPSG code for our spatial object
is 4326. EPSG is an acronym for European Petroleum Survey Group. This
group publishes, among other things, data on coordinate systems on map
projections and datums.
EPSG 4326 corresponds to the WGS84 datum (more info here),
and is a geographic CRS, therefore providing latitudes and longitudes,
it is not a projected CRS.
Side-note
WGS84 is the most common CRS in the world so is a convenient baseline
for transforming other data to to be consistent. Commonly used by
organizations that provide GIS data for the entire globe or across many
countries. It is the CRS used by Google Earth as the default.
EPSG identifier A CRS can be defined using a so-called proj4string or a unique EPSG identifier. EPSG identifiers can be searched here: epsg.io, or here: spatialreference.org.
With our map based on the unprojected WGS84, we can observe a high distorsion of country size towards high latitudes. This is also visible on this nice website: https://thetruesize.com/.
Other projections offer different results.
When using a projection, we are making a 2D representation of a 3D
volume. Converting a sphere/ellipsoid to a flat surface results in
distortions, and this implies some trade-offs. You can find many nice
resources describing these trade-offs, for example here
or here
but, in a nutshell, projections can distort four properties: shape,
area, distance and direction.
Projections can preserve the shape of countries, like the famous Mercator projection, the area (e.g. Eckert IV or Mollweide), the distance between two points (e.g. Plate Carrée) or be a compromise (e.g. Winkel Tripel).
We can map our shapefile according to another projection in two
different ways. Either we use do it within the ggplot2
framework using the function coord_sf
, or we directly
transform the data before plotting using
st_transform
.
Below are two ways of doing this with a the Mollweide projection, which
is an equal-area projection preserving the areas of countries.
mollweide <- "+proj=moll +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
ggplot(world) +
geom_sf(aes(fill = continent, color = continent),
show.legend = FALSE) +
scale_color_viridis_d(option = "magma") +
scale_fill_viridis_d(option = "magma") +
coord_sf(crs = mollweide) +
labs(title = "Mollweide (equal-area) projection") +
theme_bw()
Or, if we directly transform the data:
world_mollweide <- st_transform(world, crs = mollweide)
ggplot(world_mollweide) +
geom_sf(aes(fill = continent, color = continent),
show.legend = FALSE) +
scale_color_viridis_d(option = "magma") +
scale_fill_viridis_d(option = "magma") +
labs(title = "Mollweide (equal-area) projection") +
theme_bw()
Chosing a projection depends on the area we are interested in, and on the distortion properties. Here are two links that can help when chosing a projection: here and here.
Below are the identifiers/proj4strings of several widely used projections.
# Conformal
mercator <- 3857 # EPSG identifier
# alternatively, we could use this format:
mercator <- "+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs +type=crs"
# Equal-area
eckertIV <- "+proj=eck4 +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs"
mollweide <- "+proj=moll +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
# https://semba-blog.netlify.app/01/26/2020/world-map-and-map-projections/
# Equidistant
plate_carree <- "+proj=eqc +lat_ts=0 +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +type=crs"
# Compromise
robinson <- "+proj=robin +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +type=crs"
wink <- "+proj=wintri +lon_0=0 +lat_1=50.467 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +type=crs" # used by National Geographic
# One suited for North America
albers <- "+proj=aea +lat_0=40 +lon_0=-96 +lat_1=20 +lat_2=60 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +type=crs"
# Orange-peel
good <- "+proj=igh +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +type=crs"
# Globe view
globe <- "+proj=laea +x_0=0 +y_0=0 +lon_0=-74 +lat_0=40"
And here is the code to produce a combined plot with 4 different projections:
library("cowplot") # R package used to combine plots
# Combined plot
combined_plot <- plot_grid(
ggplot(world) +
geom_sf(aes(fill = continent, color = continent),
show.legend = FALSE) +
scale_color_viridis_d(option = "magma") +
scale_fill_viridis_d(option = "magma") +
labs(title = "Plate Carrée (equidistant along meridians)") +
theme_bw(),
ggplot(world) +
geom_sf(aes(fill = continent, color = continent),
show.legend = FALSE) +
scale_color_viridis_d(option = "magma") +
scale_fill_viridis_d(option = "magma") +
coord_sf(crs = mollweide) +
labs(title = "Mollweide (equal-area projection)") +
theme_bw(),
ggplot(world) +
geom_sf(aes(fill = continent, color = continent),
show.legend = FALSE) +
scale_color_viridis_d(option = "magma") +
scale_fill_viridis_d(option = "magma") +
coord_sf(crs = robinson) +
labs(title = "Robinson (compromise)") +
theme_bw(),
ggplot(world) +
geom_sf(aes(fill = continent, color = continent),
show.legend = FALSE) +
scale_color_viridis_d(option = "magma") +
scale_fill_viridis_d(option = "magma") +
coord_sf(crs = globe) +
labs(title = "Globe") +
theme_bw(),
nrow = 2)
combined_plot
To export a plot, it is always better to use functions such as
ggsave
rahter than to use the Export button in the plot
window. Functions allow you to change important parameters, such as the
dpi, which is usually a requirement when submitting manuscripts to a
journal, in a reproducible way.
# Saving plot
ggsave(filename = "figures/combined_world_maps.png", plot = combined_plot, width = 20, height = 10, units = "cm", dpi = 300)
Rasters constitute another type of spatial objects. They are spatial matrices made of cells at a certain resolution. They are widely used in biogeography and macroecology, mostly to store environmental data.
In R, we handle them with the terra
library.
Loading and visualising one environmental raster layer
Chelsa - Climate data http://chelsa-climate.org/
Annual
mean temperature (CHELSA_bio10_01.tif)
Extract and copy the rasters into your R-project folder ../data/
Load terra library
# install.packages("terra") # install the terra package in case not done yet
library("terra")
Import climate data from Chelsa
temperature <- rast("data/CHELSA_bio10_01.tif")
plot(temperature, main = "Temperature") # Temperature (*10 °C)
Have a look at characteristics of the raster
layers
Like for shapefiles, raster should have a CRS defined.
crs(temperature)
## [1] "GEOGCRS[\"WGS 84\",\n ENSEMBLE[\"World Geodetic System 1984 ensemble\",\n MEMBER[\"World Geodetic System 1984 (Transit)\"],\n MEMBER[\"World Geodetic System 1984 (G730)\"],\n MEMBER[\"World Geodetic System 1984 (G873)\"],\n MEMBER[\"World Geodetic System 1984 (G1150)\"],\n MEMBER[\"World Geodetic System 1984 (G1674)\"],\n MEMBER[\"World Geodetic System 1984 (G1762)\"],\n MEMBER[\"World Geodetic System 1984 (G2139)\"],\n MEMBER[\"World Geodetic System 1984 (G2296)\"],\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]],\n ENSEMBLEACCURACY[2.0]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n USAGE[\n SCOPE[\"Horizontal component of 3D system.\"],\n AREA[\"World.\"],\n BBOX[-90,-180,90,180]],\n ID[\"EPSG\",4326]]"
Raster are spatial matrices, meaning they have a certain number of cells, rows and columns.
ncol(temperature)
## [1] 43200
nrow(temperature)
## [1] 20880
ncell(temperature)
## [1] 902016000
ncol(temperature) * nrow(temperature) == ncell(temperature)
## [1] TRUE
The spatial extent can be accessed with:
ext(temperature)
## SpatExtent : -180.00013888885, 179.99985967115, -90.00013888885, 83.99986041515 (xmin, xmax, ymin, ymax)
Each cell has a certain resolution, which we can access with the following function:
res(temperature)
## [1] 0.008333333 0.008333333
The unit is decimal degree. One arc-degree is equivalent to 3600
arc-seconds, so 0.008333333 decimal degrees do correspond to 30
arc-second, which is roughly 1 km at the Equator.
An online converter is accessible here.
The resolution of a raster layer can be coarsened. Here we coarsen it by a factor 60.
temperature_30min <- aggregate(temperature, fact = 60, fun = mean, na.rm = TRUE)
dim(temperature)
## [1] 20880 43200 1
dim(temperature_30min)
## [1] 348 720 1
res(temperature)
## [1] 0.008333333 0.008333333
res(temperature_30min)
## [1] 0.5 0.5
We now have a resolution of 0.5 degree, so 1800 arc-second, so 1800/60=30 arc minute, which is roughly 55km at the Equator.
You can do calculations with raster layers
# Careful! this is computer intense and might take a long time
temperature_30min <- temperature_30min/10
plot(temperature_30min, main = "Temperature",
col = viridis(99, option = "C")) # Temperature (°C)
points(9.916, 51.541, pch = 4) # Göttingen
text(9.916, 51.541, "You are here", pos = 4)
As for shapefiles, raster layers can be reprojected. We here reproject the temperature raster to the Mollweide projection.
temperature_mollweide <- project(x = temperature_30min, y = mollweide)
plot(temperature_mollweide, main = "Mollweide projection",
col = viridis(99, option = "C"))