library(sf)
library(osmdata)
library(ggplot2)
library(magrittr) # I love the pipe %>%
library(jsonlite) # Read the running file
library(scales) # For the function scales::ordinal()
I am enjoying the end of the year 2021 in Cambridge, hosted as a visiting fellow at the McDonald Institute for Archaelogical Research. For an article I am about to submit, I need a situation map showing where the Karabel relief is located. A perfect occasion to use OpenStreetMap data with R, but it demands to create a static exhibit. Eventually, I found myself mapping my last run instead of working on the situation map. It was more compelling this way to learn the how-tos and to go through the documentation of the different packages. In short, this post is about how I (mis)used the osmdata package combined with sf and ggplot2 to make a quick map of dummy data. It starts with the loading of running data undergoing some data wrangling. This will give us some stats to add to the plot, such as the distance covered or the pace. Then it goes to import OSM data and how to plot everything together via sf+ggplot2.
Set-up
Load the packages (“stopifnot
installed”)
I import some colour keys for the plot from my file def_colours.R. Colour keys are stored in a separate file so it is easier to reuse. I tried to make colour names meaningful, so I may remember them. It happens to be a reccurent problem to me….
source("./R_scripts/def_colours.R")
We will need some running data. Originally, my trace was recorded with a GPS device, but I am logging my runs and biking trips with GoldenCheetah. It is free software, and you can use it offline (for me, this means owning your data). I used the export function to JSON that I can now import into the object result
.
<- "media/2021-11-27_goldencheetah_export.json"
run_file <- fromJSON(run_file) result
Now, that we imported the object ride
into R
, let us keep only the main data.frame, called “SAMPLES”:
<- result$RIDE$SAMPLES ride
It has the following variables: SECS, KM, KPH, ALT, LAT, LON, SLOPE, TEMP. For convenience, I withdraw some data and do some formatting
## Date & Time
<- as.POSIXct(result$RIDE$STARTTIME)
ride_start <- format(ride_start, format="%B %d (%Y)")
ride_start_day <- format(ride_start, format="%H:%M")
ride_start_hour # Temperature
<- paste0(round(mean(ride$TEMP, na.rm = TRUE), 2), "°C")
ride_mean_t # IDs
<- result$RIDE$TAGS$Athlete
ride_athtlete <- result$RIDE$TAGS$Sport ride_sport
And I transform the ride data.frame into sf
<- ride[ride$LAT>0 & ride$LON>0,] # Filter missing data
ride <- st_as_sf(ride, coords=c("LON","LAT"))
ride st_crs(ride) = 4326
In the first version of the map, I plotted the distance covered every 10 minutes and created e10m
. Then I changed my mind and wanted to plot my pace for every kilometre. In both cases, I create directly a label in a variable LABEL
.
# Every 10min (kept a record, but not used in the plot)
<- c(seq(1, max(ride$SECS),600))
e10m <- ride[ride$SECS %in% e10m,"KM"]
e10m_distance $LABEL <- paste0(
e10m_distanceround(e10m_distance$KM, 1), "km in ", round(e10m/60,0), "min")
# Every 10k
<- 1:max(ride$KM)
seq_km <- scales::ordinal(seq_km)
seq_km_ordi
# Match every (rounded) kilometre and
<- ride[round(ride$KM,1) %in% seq_km, ]
pek # Delete duplicate (take first of km)
<- pek[!duplicated(round(pek$KM,1)),]
pek $LABEL <- paste0(
pek
seq_km_ordi," k pace\n ",
round(c(pek$SECS[1]/60, diff(pek$SECS)/60), 1),"min/km")
This will give label such as “1st k pace 5.8min/km” (meaning the pace for the 1st kilometre).
Take it on line
The data are recorded as a series of points for every second, but we want to plot the ride as a continuous line, and we keep only the coordinates.
# Make it a line
<- st_sf(st_sfc(st_linestring(st_coordinates(ride))))
ride_line st_crs(ride_line) = 4326
With the line, it is easy to compute the length of the ride and from there the average speed
<- st_length(ride_line)
ride_distance <-
ride_distancek paste0(round(ride_distance/1000, 2), "km")
<-
ride_average_speed paste0(round(
/1000) /
(ride_distancemax(ride$SECS)/3600),2), "km/h") (
ride_average_speed
? 9.71km/h? Not too bad. Now, the data of the ride are ready, we will need to get geographical information.
Using osmdata
With osmdata
, the most convenient way I found to access the OSM data was first to download the data from the area of my ride and then to extract features. I create different objects mimicking a layer paradigm for the plot.
Where did I run? We compute the bounding box for the plot by expanding of 15% the limit of the ride bounding box
<- st_bbox(ride_line)
ride_box
# I created a function, it is a task I already did earlier
<- function(bbox_obj, percentage) {
expand_box <- diff(bbox_obj[c("xmin","xmax")])*percentage
to_x <- diff(bbox_obj[c("ymin","ymax")])*percentage
to_y c("xmin")] <- bbox_obj[c("xmin")] - to_x
bbox_obj[c("xmax")] <- bbox_obj[c("xmax")] + to_x
bbox_obj[c("ymin")] <- bbox_obj[c("ymin")] - to_y
bbox_obj[c("ymax")] <- bbox_obj[c("ymax")] + to_y
bbox_obj[return(bbox_obj)
}<- expand_box(ride_box, 0.15) ride_box
Now I use the box to download the OSM data within the box
# Here file is saved to ease reproducibility
<- "data/osm_query_box.Rdata"
osm_data_file if (!file.exists(osm_data_file)){
<- ride_box %>% opq(bbox = .)
query_box <- query_box %>% osmdata_sf ()
q saveRDS(object = q,
file = osm_data_file)
else {
} <- readRDS(file = osm_data_file)
q }
This object contains various geometries (points, lines, polygons, and more). The aim of the next chunk of code is to extract the different elements into different objects. I create a convenient function cull_osm
to delete empty (NAs) column when a specific feature is subset (copy-paste from mnel posted on SO). It does not bring anything, but I prefer to have neat data.
<- function(osm_subset){
cull_osm Filter(function(x)!all(is.na(x)), osm_subset)
}
<- q$osm_multipolygons %>%
boundaries subset(type=="boundary") %>% cull_osm
<- q$osm_polygons %>%
farmlands subset(landuse=="farmland") %>% cull_osm
<- q$osm_polygons %>%
meadows subset(landuse %in% c("meadow","grass","recreation_ground"))%>% cull_osm
<- q$osm_polygons %>%
residentials subset(landuse %in% c("residential","allotments")) %>% cull_osm
<- q$osm_polygons %>%
industrials subset(landuse %in% c("industrial","commercial")) %>% cull_osm
<- q$osm_polygons %>%
leisure subset(!is.na(leisure))%>% cull_osm
<- q$osm_polygons %>%
buildings subset(!is.na(building)) %>% cull_osm
<- q$osm_polygons %>%
naturals subset(!is.na(natural)) %>% cull_osm
<- q$osm_lines %>%
waterways subset(!is.na(waterway))%>% cull_osm
<-q$osm_multilines %>%
routes subset(type=="route") %>% cull_osm
<- q$osm_lines %>%
highways subset(highway %in%
c("tertiary","residential","service")) %>% cull_osm
<- q$osm_lines %>%
footways subset(highway %in% c("footway","bridleway",
"track","path")) %>% cull_osm
<- q$osm_lines %>%
railways subset(railway %in% c("rail")) %>% cull_osm
# `osmdata` downloads the features inside the bbox. I was
# looking for a way to get some geographical names. I tried
# to crop the administrative boundaries, to plot the label
# using the centroid of each features
# boundaries <- st_crop(boundaries, st_bbox(ride_box))
# but I finally took the points with a place name
<- q$osm_points %>%
names subset(!is.na(place)) %>% cull_osm
From there, I do a dummy stack of the features with specific colouring and sizing, adding a minimum of labels and some info
<-
ggmap_random_meadow_run ggplot(data=result$RIDE$SAMPLE) +
geom_sf(data = boundaries,
fill = yellow_endive, color = "transparent",
alpha = 0.3) +
geom_sf(data = industrials,
fill = grey_slate, color = "transparent",
alpha = 0.4) +
geom_sf(data = residentials,
fill = grey_conglomerate, color = "transparent",
alpha = 0.4) +
geom_sf(data = meadows,
fill = green_meadow, color = "transparent",
alpha = 0.4) +
geom_sf(data = farmlands,
fill = brown_farmland, color = "transparent",
alpha = 0.4) +
geom_sf(data = leisure,
fill = green_leisure, color = "transparent",
alpha = 0.4) +
geom_sf(data = buildings,
fill = "black", color = "black",
alpha = 0.7) +
geom_sf(data = highways,
color = grey_highway,
alpha = 0.7, size = 1) +
geom_sf(data = waterways,
colour = blue_waterway,
alpha = 0.75, size = 0.6) +
geom_sf(data = footways, colour = orange_baseball,
size = 0.7, lty = "twodash", alpha = 0.6) +
geom_sf(data = naturals, fill = green_nature,
alpha = 0.2, size = 0) +
geom_sf(data = railways, colour = brown_rail,
alpha = 0.99, size = 1.2) +
geom_sf(data = railways, colour = brown_split_rail,
alpha = 0.91, size = 1,
lty = "dotdash") +
geom_sf_label(data = names,
aes(label = name), alpha = 0.7,
hjust = 0.0, vjust = 0, fill = "black",
col = "white", size = 2.9) +
xlim(st_bbox(ride_box)[c(1,3)]) +
ylim(st_bbox(ride_box)[c(2,4)])
Let’s have a preview of the data
Ok, it is time to add the layers with the ride trace and the stats. I am changing the theme and the general layout, but this is only to improve my ggplot grammar by training.
+
ggmap_random_meadow_run geom_sf(data = ride_line,
colour = "black",
size = 1.8, alpha = 0.9) +
geom_sf(data = ride_line,
colour = yellow_ride,
size = 1.5, alpha = 1) +
#geom_sf(data = e10m_distance,
# colour = yellow_ride,
# fill = orange_baseball,
# size=5.6, pch=21, alpha=0.9) +
# geom_sf_text(data=e10m_distance, aes(label=KM)) +
geom_sf_label(data = pek,
aes(label = LABEL),
fill = yellow_brown_palette(nrow(pek)),
size = 2.9, alpha = 0.9, hjust = -0.1,
vjust = 1) +
geom_sf(data = pek,
colour = "black",
fill = bleu_de_france,
size = 1.8, pch = 21, alpha = 0.8) +
geom_sf_label(data = ride[1,],
aes(label = "Start"),
fill = "#ffcccc",
alpha = 0.9) +
labs(title = paste("Map of the", (ride_sport),
"Enjoyed", ride_start_day),
subtitle = paste("Started at", ride_start_hour,
"for a distance of", ride_distancek,
"by", ride_mean_t),
caption = paste("Data: OpenStreetMap","/",
"Software:", "R+osmdata+sf+ggplot2", "/",
"Author:", "Néhémie")
+
) geom_text(data=data.frame(),
aes(label = paste0("Average speed:",
ride_average_speed),x = Inf,
y = -Inf),
hjust = 1.01,
vjust = -0.2) +
ylab("") +
xlab("")+
theme_minimal() +
theme(axis.text = element_text(angle = 45,
vjust = 0.5,
hjust = 1,
colour = "grey"),
axis.title.x = element_text(angle = 0,
vjust = 0.5,
colour = "grey"),
axis.title.y = element_text(angle = 0,
vjust = 0.5,
colour = "grey"),
### Background
plot.background = element_rect(fill = "#f8f8f8",
colour = "#f9f9f9"),
panel.background =
element_rect(fill = white_paper,
colour = white_paper_border,
size = 0.5, linetype = "solid"),
panel.grid.major =
element_line(size = 0.1,
linetype = 'dashed',
colour = white_paper_border),
panel.grid.minor =
element_line(size = 0.25,
linetype = 'solid',
colour = "black")
# Done! )
I am not convinced with this plot, but I think it is a decent start.
Testing replicability
When I was writing these lines, I thought I should go and record another run to see how my code performs. Originally, I wanted to run the same track, but I lost myself and I did a slightly different route. Well, it’s probably better for the replicability (and not only the reproducibility of the experiment). This next map is the result if you run (ahem) the code presented above (script here), only by swapping the data exported from GoldenCheetah.
source("R_scripts/2022-12-02_plot_my_run.R")
plot_my_run("media/2021-12-02_goldencheetah_export.json")
Well, a lot of overlays. The key takeaway from this exercise is obviously that I should not circuit! Or being faster at it… Otherwise, if the map is static (a requirement for the situation map I want to do), reducing the amount of text and put the kilometres (1st, 2nd, 3rd) into the point. Likewise, for circuit, I should change the orientation of the label (not too difficult to orient them according to the location in the map: left goes left, top goes up, and so on). At least, there is room for improvement in my running, orientation, and coding skills!