Spatial Machine Learning
An intro in applying machine learning techniques to spatial data with R
I became interested in data analysis/science by way of a Geographic Information Systems (GIS) course during my Environmental Policy and Sustainability Management Master’s at The New School. We utilized the industry standard proprietary software program ArcGIS, however after graduation, I lost access to this costly program. This is when I became obsessed with replicating the GIS workflow within an open source environment (see some of my other blogs, namely GIS project with Python and GeoPandas).
Naturally, having a love for GIS and data science, I began to tackle a few projects that applied machine learning concepts to spatial data. However, initially I wasn’t aware of the types of challenges unique to spatial data. According to Jiang¹, the following are aspects that make the application of machine learning concepts to spatial data a challenge:
- Spatial autocorrelation — autocorrelation due to the similarity in location of the data’s spatial component
- Spatial Heterogeneity — data not following identical distribution within the sample area
- Limited Ground Truth — many explanatory variables, limited ground truth
- Multiple Scales and Resolutions — may exist in multiple scales and resolutions
If unaccounted for, these can effect machine learning prediction outcomes producing sub-optimal results. This blog will detail spatial prediction techniques that attempt to address the first two challenges when applying machine learning to spatial data; spatial autocorrelation and spatial heterogeneity.
Spatial data comes in a few different types:
- Points — point reference data i.e. cities on a map
- Lines — line strings i.e. road on a map
- Polygons (or Multi-Polygons) — shape with multiple sides i.e. census tracts
- Cells — typically used in areal data such as raster based imagery
These types require different processes to work with machine learning techniques. This blog will deal exclusively with…