Geospatial/SAE workshop

Raster files

Josh Merfeld

October 2024

Goal

  • By the end of the day, we want to be able to:
    • Use geospatial data to estimate a sub-area model
    • We won’t actually do that today, we’ll do that tomorrow
    • Today focus is getting ALL of the data we need

Rasters

  • We’ve discussed shapefiles
    • Now let’s talk about rasters!

  • Rasters are a different type of geospatial data
    • They are made up of a grid of cells
    • Each cell has a value

Example raster grid - how much info do we need?

  • Here’s a grid.
    • How many points do we need?

Example raster grid - how much info do we need?

  • Need to know location of one grid cell…
    • And the size of each grid!

How much info do we need?

  • In other words, we do not need a point for every raster cell

  • We just need to know:

    • The location of one cell
    • The size of each cell
      • This is called the resolution of the raster

  • Example:

    • I know the first grid cell in bottom left is at (0, 0)
    • I know each grid cell is 1 meter by 1 meter (the resolution)
    • Then I know the exact location of every single grid cell

Population in Cotonou, Benin

  • What are the white values?

Population in Cotonou, Benin

  • Here’s the information for this raster
    • What’s the resolution? What are the units?
class       : SpatRaster 
dimensions  : 34, 46, 1  (nrow, ncol, nlyr)
resolution  : 0.0008333333, 0.0008333333  (x, y)
extent      : 2.39375, 2.432083, 6.342917, 6.37125  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : beninpop.tif 
name        : beninpop 

Rasters

  • Rasters are defined by the grid layout and the resolution
    • Grid cells are sometimes called pixels (just like images, which are often rasters!)

  • There are many different file types for rasters
    • .tif or .tiff (one of the most common)
    • .nc (NetCDF, common for very large raster data)
    • Image files, e.g. png, jpg, etc.

Reading rasters in R

  • Reading rasters is also quite easy!
    • Going to use the terra package for it
      • Note: can use terra for shapefiles, too
    • rastersdata/beninpop.tif is a raster of population counts in Benin
Code
library(terra)

# this is the raster for Cotonou, Benin
cotonou <- rast("rastersdata/beninpop.tif")
cotonou
class       : SpatRaster 
dimensions  : 34, 46, 1  (nrow, ncol, nlyr)
resolution  : 0.0008333333, 0.0008333333  (x, y)
extent      : 2.39375, 2.432083, 6.342917, 6.37125  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : beninpop.tif 
name        : beninpop 

Plotting rasters

  • Plotting rasters only with terra is a bit of a pain
    • Can’t use ggplot
    • So, I load another package that lets me use ggplot with rasters
      • tidyterra
Code
library(tidyterra)

ggplot() +
  geom_spatraster(data = cotonou) + 
  theme_bw() +
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))

Making it nicer

Code
library(tidyterra)

ggplot() +
  geom_spatraster(data = cotonou) + 
  # distiller is for continuous values
  # but we can use palettes!
  # I like spectral a lot
  scale_fill_distiller("Population\ncount", 
    palette = "Spectral", na.value = "white") +
  theme_bw() +
  labs(subtitle = "Population in Cotonou, Benin") + 
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))

Extracting raster data to shapefiles

  • Let’s go back to our use case:
    • We want to estimate a sub-area model at the EA level in Malawi
    • This means we need to extract raster data to the EA level
    • We can do this with terra, sf, and exactextractr
      • terra has its own method, but i find exactextractr to be MUCH faster

  • Let’s start by looking at the raster I’ve uploaded to the rastersdata: mwpop.tif

Give it a try

  • Try to load it into R using terra, then plot it with tidyterra and ggplot
Code
tif <- rast("rastersdata/mwpop.tif")

ggplot() +
  geom_spatraster(data = tif) + 
  scale_fill_distiller("Population\ncount", 
    palette = "Spectral", na.value = "white") +
  theme_bw() +
  labs(subtitle = "Population in Northern Malawi") + 
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))

Give it a try

  • I actually don’t like that map! It’s too hard to see because of all the low values.
  • So let’s take logs, instead!
    • Note that all the zeros become missing (can’t log zero)
Code
tif <- rast("rastersdata/mwpop.tif")

ggplot() +
  geom_spatraster(data = log(tif)) + 
  scale_fill_distiller("Population\ncount (log)", 
    palette = "Spectral", na.value = "white") +
  theme_bw() +
  labs(subtitle = "Population in Northern Malawi") + 
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))

We want to extract the .tif values to the .shp

Let’s do it with exactextractr

Code
library(exactextractr)

tif <- rast("rastersdata/mwpop.tif")
adm4 <- read_sf("rastersdata/mw4.shp")
# make sure they are in the same CRS! (they already are, but just in case)
# st_transform is for the sf object
adm4 <- st_transform(adm4, crs = crs(tif))

# extract the raster values to the shapefile
# we are going to SUM, and add the EA_CODE from the shapefile to the result
extracted <- exact_extract(tif, adm4, fun = "sum", append_cols = "EA_CODE")
Code
head(extracted)
   EA_CODE       sum
1 10507801 1068.7787
2 10507072  695.8013
3 10507010  938.1139
4 10507001  749.6960
5 10507009  597.5432
6 10507033  474.3934

Now we can join the extracted data to the shapefile

Code
# join
adm4 <- adm4 |>
  left_join(extracted, by = "EA_CODE")

# plot it!
ggplot() +
  geom_sf(data = adm4, aes(fill = sum), 
    color = "black", lwd = 0.01) +
  scale_fill_distiller("Population\ncount", 
    palette = "Spectral", na.value = "white") +
  theme_bw() +
  labs(subtitle = "Population in EAs") + 
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))

Now it’s your turn

  • Here’s your task:

Now it’s your turn

  • Here’s your task:
    • Search for “worldpop population counts”
    • Scroll down the page, click on “unconstrained individual countries 2000-2020 UN adjusted (1km resolution)
    • Then, search for a country (maybe yours?)

Now it’s your turn

  • Here’s your task:
    • Search for “worldpop population counts”
    • Scroll down the page, click on “unconstrained individual countries 2000-2020 UN adjusted (1km resolution)
    • Then, search for a country (maybe yours?)
    • Click on “Data & Resources” for 2020
    • Scroll down to the bottom of the page and download the .tif

Now it’s your turn

  • Load the .tif into R using terra
  • Plot the raster using tidyterra and ggplot
    • Make it look nice!

Let’s keep going!

  • Now you need to find a shapefile for the same country
  • This will be a bit less straightforward
    • Search for “shapefile COUNTRY humdata”
    • You should find a link to the Humanitarian Data Exchange
    • Click on it and see if it has shapefiles for your country of choice
    • If so, download a shapefile (it can be at a higher admin level)
      • If not, raise your hand and I’ll come help you find a shapefile
    • Load it into R and plot it!

One last thing

  • You have the population tif and the shapefile
  • Extract the population data (using sum, don’t forget!) to the shapefile
    • Use append_cols and make sure you choose the correct identifier!
  • Join the data to the shapefile
  • Plot the shapefile with the population data
    • Make it look nice!

What can you do with that data?

  • Now you have a shapefile with population data
  • You can save it as a .csv and use it in your analysis!
    • We’ll get to this point eventually.
    • We will also discuss adding the survey data and then estimating a sub-area model

Creating a grid

  • Yesterday, we used a grid in Korea
    • kgrid.shp
  • By now, you can probably see that a grid is very similar to a raster!

Load the shapefile

  • Let’s load kshape.shp
Code
kshape <- vect("rastersdata/kshape.shp")
kgrid <- rast(kshape, res = 5000)
kgrid <- as.polygons(kgrid)
kgrid$id <- 1:nrow(kgrid)

The grid

Not quite done

  • We aren’t quite done. What do we want to do now?
Code
intersection <- intersect(kshape, kgrid)
kgrid <- kgrid |>
  filter(id %in% intersection$id)

Not quite done

Where can you find rasters?

  • Depending on the variable, rasters are sometimes quite easy to find!
    • We already saw one example: WorldPop (population counts)

  • There are two large online repositories:

Google Earth Engine

  • Google Earth Engine has a lot of data.

  • Let’s see some examples