Néhémie’s Travelogue

I am enjoying the end of the year 2021 in Cambridge, hosted as a visiting fellow at the McDonald Institute for Archaelogical Research. For an article I am about to submit, I need a situation map showing where the Karabel relief is located. A perfect occasion to use OpenStreetMap data with R, but it demands to create a static exhibit. Eventually, I found myself mapping my last run instead of working on the situation map. It was more compelling this way to learn the how-tos and to go through the documentation of the different packages. In short, this post is about how I (mis)used the osmdata package combined with sf and ggplot2 to make a quick map of dummy data. It starts with the loading of running data undergoing some data wrangling. This will give us some stats to add to the plot, such as the distance covered or the pace. Then it goes to import OSM data and how to plot everything together via sf+ggplot2.

Set-up

Load the packages (“stopifnot installed”)

library(sf)
library(osmdata)
library(ggplot2)
library(magrittr) # I love the pipe %>%
library(jsonlite) # Read the running file
library(scales)   # For the function scales::ordinal()

I import some colour keys for the plot from my file def_colours.R. Colour keys are stored in a separate file so it is easier to reuse. I tried to make colour names meaningful, so I may remember them. It happens to be a reccurent problem to me….

source("./R_scripts/def_colours.R")

We will need some running data. Originally, my trace was recorded with a GPS device, but I am logging my runs and biking trips with GoldenCheetah. It is free software, and you can use it offline (for me, this means owning your data). I used the export function to JSON that I can now import into the object result.

run_file <- "media/2021-11-27_goldencheetah_export.json"
result <- fromJSON(run_file)

Now, that we imported the object ride into R, let us keep only the main data.frame, called “SAMPLES”:

ride <- result$RIDE$SAMPLES

It has the following variables: SECS, KM, KPH, ALT, LAT, LON, SLOPE, TEMP. For convenience, I withdraw some data and do some formatting

## Date & Time
ride_start  <- as.POSIXct(result$RIDE$STARTTIME)
ride_start_day <- format(ride_start, format="%B %d (%Y)")
ride_start_hour <- format(ride_start, format="%H:%M")
# Temperature
ride_mean_t <- paste0(round(mean(ride$TEMP, na.rm = TRUE), 2), "°C")
# IDs
ride_athtlete <- result$RIDE$TAGS$Athlete
ride_sport <- result$RIDE$TAGS$Sport

And I transform the ride data.frame into sf

ride <- ride[ride$LAT>0 & ride$LON>0,] # Filter missing data
ride <- st_as_sf(ride, coords=c("LON","LAT"))
st_crs(ride) = 4326

In the first version of the map, I plotted the distance covered every 10 minutes and created e10m . Then I changed my mind and wanted to plot my pace for every kilometre. In both cases, I create directly a label in a variable LABEL.

# Every 10min (kept a record, but not used in the plot)
e10m <- c(seq(1, max(ride$SECS),600))
e10m_distance <- ride[ride$SECS %in% e10m,"KM"]
e10m_distance$LABEL <- paste0(
     round(e10m_distance$KM, 1), "km in ", round(e10m/60,0), "min")

# Every 10k
seq_km <- 1:max(ride$KM)
seq_km_ordi <- scales::ordinal(seq_km)

# Match every (rounded) kilometre and
pek <- ride[round(ride$KM,1) %in% seq_km, ]
# Delete duplicate (take first of km)
pek <- pek[!duplicated(round(pek$KM,1)),]
pek$LABEL <- paste0(
     seq_km_ordi,
     " k pace\n ",
     round(c(pek$SECS[1]/60, diff(pek$SECS)/60), 1),"min/km")

This will give label such as “1st k pace 5.8min/km” (meaning the pace for the 1st kilometre).

Take it on line

The data are recorded as a series of points for every second, but we want to plot the ride as a continuous line, and we keep only the coordinates.

# Make it a line
ride_line <- st_sf(st_sfc(st_linestring(st_coordinates(ride))))
st_crs(ride_line) = 4326

With the line, it is easy to compute the length of the ride and from there the average speed

ride_distance <-  st_length(ride_line)
ride_distancek  <-
  paste0(round(ride_distance/1000, 2), "km")
ride_average_speed <-
  paste0(round(
          (ride_distance/1000) /
               (max(ride$SECS)/3600),2), "km/h")

ride_average_speed? 9.71km/h? Not too bad. Now, the data of the ride are ready, we will need to get geographical information.

Using `osmdata`

With osmdata, the most convenient way I found to access the OSM data was first to download the data from the area of my ride and then to extract features. I create different objects mimicking a layer paradigm for the plot.

Where did I run? We compute the bounding box for the plot by expanding of 15% the limit of the ride bounding box

ride_box <- st_bbox(ride_line)

# I created a function, it is a task I already did earlier
expand_box <- function(bbox_obj, percentage) {
  to_x <- diff(bbox_obj[c("xmin","xmax")])*percentage
  to_y <- diff(bbox_obj[c("ymin","ymax")])*percentage
  bbox_obj[c("xmin")] <- bbox_obj[c("xmin")] - to_x
  bbox_obj[c("xmax")] <- bbox_obj[c("xmax")] + to_x
  bbox_obj[c("ymin")] <- bbox_obj[c("ymin")] - to_y
  bbox_obj[c("ymax")] <- bbox_obj[c("ymax")] + to_y
  return(bbox_obj)
}
ride_box <- expand_box(ride_box, 0.15)

Now I use the box to download the OSM data within the box

# Here file is saved to ease reproducibility

osm_data_file <- "data/osm_query_box.Rdata"
if (!file.exists(osm_data_file)){
 query_box  <- ride_box  %>% opq(bbox = .)
 q <- query_box %>% osmdata_sf ()
 saveRDS(object = q,
         file = osm_data_file)
} else {
 q <- readRDS(file = osm_data_file)
}

This object contains various geometries (points, lines, polygons, and more). The aim of the next chunk of code is to extract the different elements into different objects. I create a convenient function cull_osm to delete empty (NAs) column when a specific feature is subset (copy-paste from mnel posted on SO). It does not bring anything, but I prefer to have neat data.

cull_osm <- function(osm_subset){
  Filter(function(x)!all(is.na(x)), osm_subset)
}

boundaries <- q$osm_multipolygons %>%
  subset(type=="boundary") %>% cull_osm


farmlands <- q$osm_polygons %>%
  subset(landuse=="farmland") %>% cull_osm

meadows <- q$osm_polygons %>%
  subset(landuse %in% c("meadow","grass","recreation_ground"))%>% cull_osm

residentials <- q$osm_polygons %>%
  subset(landuse %in% c("residential","allotments")) %>% cull_osm

industrials <- q$osm_polygons %>%
  subset(landuse %in% c("industrial","commercial")) %>% cull_osm

leisure <- q$osm_polygons %>%
  subset(!is.na(leisure))%>% cull_osm

buildings <- q$osm_polygons %>%
  subset(!is.na(building)) %>% cull_osm

naturals <- q$osm_polygons %>%
  subset(!is.na(natural)) %>% cull_osm

waterways <- q$osm_lines %>%
  subset(!is.na(waterway))%>% cull_osm

routes  <-q$osm_multilines %>%
  subset(type=="route") %>% cull_osm

highways <- q$osm_lines %>%
   subset(highway %in%
          c("tertiary","residential","service")) %>% cull_osm

footways <- q$osm_lines   %>%
  subset(highway %in% c("footway","bridleway",
                        "track","path")) %>% cull_osm

railways <- q$osm_lines %>%
   subset(railway %in% c("rail")) %>% cull_osm

# `osmdata` downloads the features inside the bbox. I was
# looking for a way to get some geographical names. I tried
# to crop the administrative boundaries, to plot the label
# using the centroid of each features
#     boundaries <- st_crop(boundaries, st_bbox(ride_box))
# but I finally took the points with a place name

names <- q$osm_points %>%
  subset(!is.na(place)) %>% cull_osm

From there, I do a dummy stack of the features with specific colouring and sizing, adding a minimum of labels and some info

ggmap_random_meadow_run <-
  ggplot(data=result$RIDE$SAMPLE) +
  geom_sf(data = boundaries,
          fill = yellow_endive, color = "transparent",
          alpha = 0.3) +
  geom_sf(data = industrials,
          fill = grey_slate, color = "transparent",
          alpha = 0.4) +
  geom_sf(data = residentials,
          fill = grey_conglomerate, color = "transparent",
          alpha = 0.4) +
  geom_sf(data = meadows,
          fill = green_meadow, color = "transparent",
          alpha = 0.4) +
  geom_sf(data = farmlands,
          fill = brown_farmland, color = "transparent",
          alpha = 0.4) +
  geom_sf(data = leisure,
          fill = green_leisure, color = "transparent",
          alpha = 0.4) +
  geom_sf(data = buildings,
          fill = "black", color = "black",
          alpha = 0.7) +
  geom_sf(data = highways,
          color = grey_highway,
          alpha = 0.7, size = 1) +
  geom_sf(data = waterways,
          colour = blue_waterway,
          alpha = 0.75, size = 0.6) +
  geom_sf(data = footways, colour = orange_baseball,
          size = 0.7, lty = "twodash", alpha = 0.6) +
  geom_sf(data = naturals, fill = green_nature,
          alpha = 0.2,  size = 0) +
  geom_sf(data = railways, colour = brown_rail,
          alpha = 0.99, size = 1.2) +
  geom_sf(data = railways, colour = brown_split_rail,
          alpha = 0.91, size = 1,
          lty = "dotdash") +
  geom_sf_label(data = names,
                aes(label = name), alpha = 0.7,
               hjust = 0.0, vjust = 0, fill = "black",
               col = "white", size = 2.9) +
  xlim(st_bbox(ride_box)[c(1,3)]) +
  ylim(st_bbox(ride_box)[c(2,4)])

Let’s have a preview of the data

Ok, it is time to add the layers with the ride trace and the stats. I am changing the theme and the general layout, but this is only to improve my ggplot grammar by training.

ggmap_random_meadow_run +
  geom_sf(data = ride_line,
          colour = "black",
          size = 1.8, alpha = 0.9) +
  geom_sf(data = ride_line,
          colour = yellow_ride,
          size = 1.5, alpha = 1) +
  #geom_sf(data = e10m_distance,
  #        colour = yellow_ride,
  #        fill = orange_baseball,
  #        size=5.6, pch=21, alpha=0.9) +
  #   geom_sf_text(data=e10m_distance, aes(label=KM)) +
  geom_sf_label(data = pek,
                aes(label = LABEL),
                fill = yellow_brown_palette(nrow(pek)),
                size = 2.9, alpha = 0.9, hjust = -0.1,
                vjust = 1) +
  geom_sf(data = pek,
          colour = "black",
          fill = bleu_de_france,
          size = 1.8, pch = 21, alpha = 0.8) +
  geom_sf_label(data = ride[1,],
                aes(label = "Start"),
                    fill = "#ffcccc",
                alpha = 0.9) +
  labs(title = paste("Map of the", (ride_sport),
                     "Enjoyed", ride_start_day),
    subtitle = paste("Started at", ride_start_hour,
                     "for a distance of", ride_distancek,
                     "by", ride_mean_t),
     caption = paste("Data: OpenStreetMap","/",
                     "Software:", "R+osmdata+sf+ggplot2", "/",
                     "Author:", "Néhémie")
  ) +
   geom_text(data=data.frame(),
             aes(label = paste0("Average speed:",
                                ride_average_speed),
                 x = Inf,
                 y = -Inf),
                 hjust = 1.01,
                 vjust = -0.2) +
  ylab("") +
  xlab("")+
  theme_minimal() +
         theme(axis.text = element_text(angle = 45,
                                           vjust = 0.5,
                                           hjust = 1,
                                           colour = "grey"),
               axis.title.x = element_text(angle = 0,
                                           vjust = 0.5,
                                           colour = "grey"),
               axis.title.y = element_text(angle = 0,
                                           vjust = 0.5,
                                           colour = "grey"),
    ### Background
    plot.background = element_rect(fill = "#f8f8f8",
                                   colour = "#f9f9f9"),
    panel.background =
      element_rect(fill = white_paper,
                   colour = white_paper_border,
                   size = 0.5, linetype = "solid"),
    panel.grid.major =
      element_line(size = 0.1,
                   linetype = 'dashed',
                   colour = white_paper_border),
    panel.grid.minor =
      element_line(size = 0.25,
                   linetype = 'solid',
                   colour = "black")
         ) # Done!

I am not convinced with this plot, but I think it is a decent start.

Testing replicability

When I was writing these lines, I thought I should go and record another run to see how my code performs. Originally, I wanted to run the same track, but I lost myself and I did a slightly different route. Well, it’s probably better for the replicability (and not only the reproducibility of the experiment). This next map is the result if you run (ahem) the code presented above (script here), only by swapping the data exported from GoldenCheetah.

source("R_scripts/2022-12-02_plot_my_run.R")

plot_my_run("media/2021-12-02_goldencheetah_export.json")

Well, a lot of overlays. The key takeaway from this exercise is obviously that I should not circuit! Or being faster at it… Otherwise, if the map is static (a requirement for the situation map I want to do), reducing the amount of text and put the kilometres (1st, 2nd, 3rd) into the point. Likewise, for circuit, I should change the orientation of the label (not too difficult to orient them according to the location in the map: left goes left, top goes up, and so on). At least, there is room for improvement in my running, orientation, and coding skills!

Set-up

Take it on line

Using osmdata

Testing replicability

Using `osmdata`