Spatial Machine Learning

An intro in applying machine learning techniques to spatial data with R

Justin Morgan Williams

--

Photo by DeepMind on Unsplash

Background

I became interested in data analysis/science by way of a Geographic Information Systems (GIS) course during my Environmental Policy and Sustainability Management Master’s at The New School. We utilized the industry standard proprietary software program ArcGIS, however after graduation, I lost access to this costly program. This is when I became obsessed with replicating the GIS workflow within an open source environment (see some of my other blogs, namely GIS project with Python and GeoPandas).

Naturally, having a love for GIS and data science, I began to tackle a few projects that applied machine learning concepts to spatial data. However, initially I wasn’t aware of the types of challenges unique to spatial data. According to Jiang¹, the following are aspects that make the application of machine learning concepts to spatial data a challenge:

  • Spatial autocorrelation — autocorrelation due to the similarity in location of the data’s spatial component
  • Spatial Heterogeneity — data not following identical distribution within the sample area
  • Limited Ground Truth — many explanatory variables, limited ground truth
  • Multiple Scales and Resolutions — may exist in multiple scales and resolutions

If unaccounted for, these can effect machine learning prediction outcomes producing sub-optimal results. This blog will detail spatial prediction techniques that attempt to address the first two challenges when applying machine learning to spatial data; spatial autocorrelation and spatial heterogeneity.

Spatial Data

Spatial data comes in a few different types:

  • Points — point reference data i.e. cities on a map
  • Lines — line strings i.e. road on a map
  • Polygons (or Multi-Polygons) — shape with multiple sides i.e. census tracts
  • Cells — typically used in areal data such as raster based imagery

These types require different processes to work with machine learning techniques. This blog will deal exclusively with…

--

--

Justin Morgan Williams

Data scientist passionate about the intersectionality of sustainability and data.