Spatial Machine Learning
An intro in applying machine learning techniques to spatial data with R
Background
I became interested in data analysis/science by way of a Geographic Information Systems (GIS) course during my Environmental Policy and Sustainability Management Master’s at The New School. We utilized the industry standard proprietary software program ArcGIS, however after graduation, I lost access to this costly program. This is when I became obsessed with replicating the GIS workflow within an open source environment (see some of my other blogs, namely GIS project with Python and GeoPandas).
Naturally, having a love for GIS and data science, I began to tackle a few projects that applied machine learning concepts to spatial data. However, initially I wasn’t aware of the types of challenges unique to spatial data. According to Jiang¹, the following are aspects that make the application of machine learning concepts to spatial data a challenge:
- Spatial autocorrelation — autocorrelation due to the similarity in location of the data’s spatial component
- Spatial Heterogeneity — data not following identical distribution within the sample area
- Limited Ground Truth — many explanatory variables, limited ground truth