How we used machine learning to search satellite images of Nothern-West regions of Ukraine (70,000 km², ~ roughly the area of Czech republic)) for places with illegal amber mining.
In 2010 world prices for amber started to surge. Due to this in 2012 demand was so high that north-western part of Ukraine became place of "amber rush" or "new Wild West". Thousands of prospectors starts to search for gems with shovels and later with water pumps. Hundreds of hectares in forests / agricultural land became a desert, a lifeless moon landscape. 2014-2016 were most intence years of illegal mining, but it's still going right now. How it looks from the drone:
We decided to estimate, for the first time, the scale of environmental impact from this phenomenon.
Research how patches of land with illegal mining could look on satellite images (look at images for some known locations with amber mining, use known videos and photos, make interviews with field experts)
Find which map providers have relatively recent satellite images with good resolution. Find examples of places with mining on such images (Mostly it's Bing Maps by Microsoft. It has very generous API, for example it provides needed metadata such as a date for each image. Due to small characteristic size of digged holes, we needed resolution no less then 1m per pixel).
Tiles distribution, by year:
Year | 2011 | 2012 | 2014 | 2015 | 2016 |
Number of images | 1933 | 4669 | 117059 | 271893 | 55403 |
Find and compile initial set of coordinates for images with traces of mining (we found ~50 such first places with a huge help from participants of Open Data Day Kyiv )
Split each such tile to superpixels/segments (part of image with approximately homogeneous visual appearance) With simple linear iterative clustering (SLIC) algorithm (http://www.kev-smith.com/papers/SLIC_Superpixels.pdf)
Use neural net to extract features for each superpixel. We used transfer learning with a pretrained, vanilla ResNet50 from exellent Keras library by François Chollet and others)
Create labelled set of superpixels for binary classifier (split images on two sets - with traces of amber mining, and without such traces). Some interesting examples of "false positives", i.e. images which are similar to amber mining.
Create machine model to classify superpixels (XGBoost was choosen due to best performance, after several attempts (SVM, RandomForest)). Estimate performance (production model: f1=0.91, recall=0.88, precision=0.95. We cared just about a lower bound of estimation and about false positives, so high precision is a must. ) Make (visual debugging) of classifier with interactive scatterplot.
Apply steps 4,5 and classifier from step 7 to each superpixel/segment for all images from region of interest. If more then 2 superpixels classified as positive (note that we halved false positive error here), then mark current image as area of mining. We processed approximately 450,000 images from region with total area about 70,000 km², total computation time was ~100 hours on one computer with two GeForce GTX 960 onboard
Our readers (local citizens from region) pointed to whole class of errors caused by some specific places with deforestation which we used as positive examples. We removed all such places from training dataset, retrained a model, reprocessed all images and published new version of map after additional review from field experts (took 2 weeks).
Create interactive map with places found by our method ( Most active period of mining was during 2014-2016 years. Most tiles from maps dated by 2015. We found more then 1,000 hectares of damaged land
Result: we present most detailed (as for this moment), interactive map of impact on environment due to illegal amber mining in Ukraine (in English)