Spatial Machine Learning

An intro in applying machine learning techniques to spatial data with R

Justin Morgan Williams


Photo by DeepMind on Unsplash


I became interested in data analysis/science by way of a Geographic Information Systems (GIS) course during my Environmental Policy and Sustainability Management Master’s at The New School. We utilized the industry standard proprietary software program ArcGIS, however after graduation, I lost access to this costly program. This is when I became obsessed with replicating the GIS workflow within an open source environment (see some of my other blogs, namely GIS project with Python and GeoPandas).

Naturally, having a love for GIS and data science, I began to tackle a few projects that applied machine learning concepts to spatial data. However, initially I wasn’t aware of the types of challenges unique to spatial data. According to Jiang¹, the following are aspects that make the application of machine learning concepts to spatial data a challenge:

  • Spatial autocorrelation — autocorrelation due to the similarity in location of the data’s spatial component
  • Spatial Heterogeneity — data not following identical distribution within the sample area
  • Limited Ground Truth — many explanatory variables, limited ground truth
  • Multiple Scales and Resolutions — may exist in multiple scales and resolutions

If unaccounted for, these can effect machine learning prediction outcomes producing sub-optimal results. This blog will detail spatial prediction techniques that attempt to address the first two challenges when applying machine learning to spatial data; spatial autocorrelation and spatial heterogeneity.

Spatial Data

Spatial data comes in a few different types:

  • Points — point reference data i.e. cities on a map
  • Lines — line strings i.e. road on a map
  • Polygons (or Multi-Polygons) — shape with multiple sides i.e. census tracts
  • Cells — typically used in areal data such as raster based imagery

These types require different processes to work with machine learning techniques. This blog will deal exclusively with…



Justin Morgan Williams

Data scientist passionate about the intersectionality of sustainability and data.