Turning KML into tidy data frames

This note briefly introduces the tidykml package, which turns basic KML geometries into tidy data frames that can be visualized with ggplot2.

Summary

The tidykml package provides a quick way to import data from Google My Maps into R, in a format that makes it easy to manipulate the data and visualize it with ggplot2.

Below is an example that uses data from a map of the U.S. Civil War.

The rest of this note expands on some of the ideas outlined in the README file of the tidykml package.

The problem

Some of the maps available through Google My Maps contain extremely detailed information about things like non-Hispanic gangs in South Los Angeles. (If you are interested in the topic, Hispanic gangs are there too, and the relevant Reddit thread links to even more detailed maps covering even more territory.)

A problem, however, is that there is no straightforward way to reuse the data from these maps in R. Google My Maps exports its maps in KML, and while several packages, such as rgdal and sf, can read the Simple Features that make up a KML file, these packages do not (yet) provide methods to easily pass the data to ggplot2.

The state of the art

To be clear, it has long been possible to plot spatial data with ggplot2, and the ggmap package makes several static map sources easily available from within R, for use with ggplot2 or with other packages. Furthermore, it is possible to plot Simple Features with other tools, such as Leaflet, a JavaScript library for interactive maps for which there is a corresponding R package. As a result, there are tons of examples online of both static and interactive maps built with R.

Solutions to plot spatial data with ggplot2, still, remain complex and somewhat experimental. One solution currently in development is the ggspatial package, which makes use of Michael D. Sumner and Kohske Takahashi's ggpolypath package. Michael D. Sumner is also developing a suite of packages, spbabel and spdplyr, to turn spatial data into tidy data frames.

Spatial data are complex to plot: they do not fit nicely in rectangular datasets, they make use of several coordinate systems, and they involve mixes of raster and vector information more often than many other data. However, a lot of people are working on spatial data visualization, and places like R-sig-geo or GIS Stack Exchange contain many helpful threads on the topic.

A temporary solution

After checking the sources mentioned above, I decided that KML files downloaded from Google My Maps deserved their own little experimental package. The aim of the package would be to go as quickly as possible from the KML file to ggplot2, which meant coding some sort of fortify method for KML data.

Again, it is important to stress that there are already some methods to read spatial data with ggplot2. However, I wanted something that would be tailored to the kind of data provided on Google My Maps, and therefore ended up writing a bunch of xml2 wrappers to read KML into tidy data frames.

The result is the tidykml package, which reads basic KML geometries into tibbles, a.k.a. tidy data frames. The "Sherman's March" example shown at the top of this note is an elaboration on one of the two examples featured in the README of the package, the other example being this map of L.A. non-Hispanic gangs:

Both maps are the results of less than a dozen lines of code. The raw KML files used in the two examples are bundled with the package, in zipped KML format: see the ?gangs and ?states documentation pages for details and precise sources.

Some drastic limitations

As underlined in its README, the tidykml package is drastically limited in at least two ways. The first limitation is that the package was conceived for, and tested against, KML files from either GADM, a database of global administrative areas, or from Google My Maps. As a result, it might misbehave with KML files from other sources.

The second limitation is an even more drastic one: the tidykml package takes the easy way out of multi-geometries (such as multi-polygons) by only taking into account the first element of these geometries. This means, for instance, that a U.S. state that contains islands will lose these islands on import (provided that the first polygon of the state holds its mainland component, and all further elements hold its islands).

Both limitations above have solutions, but these solutions are too complex with regards to the goal of the tidykml package, which is available on GitHub but will probably never be available on CRAN, given its experimental nature and limitations. In the future, I would rather trust other R packages to develop comprehensive and straightforward methods to visualize KML files and other forms of spatial data.


Update (December 31, 2016): the package has now been tested against GADM data. The limits of the package are very obvious: since it does not handle inner boundaries (holes in polygons), the map for France at LevelĀ 0, for instance, is a complete failure. Similarly, very detailed maps (France at levelĀ 4, for instance) take a long time to process.

  • First published on December 31st, 2016