Search for illegal amber mining on satellite images, HOWTO
How we created dataset and trained a model
This is a description of methodology for project Leprosy of the Land made by Texty during March, 2018. Main idea was to use machine learning to find all places of illegal amber mining in Nothern-West regions of Ukraine, on satellite images.
In 2010 world prices for amber started to surge. Due to this in 2012 demand was so high that north-western part of Ukraine became place of "amber rush" and "new Wild West". Thousands of prospectors starts to search for gems with shovels and later with water pumps. Hundreds of hectars in forests / agricultural land became a desert, a lifeless moon landscape. 2014-2016 were most intence years of illegal mining, but it's still going right now.
We decided to estimate, for the first time, the scale of environmental impact from this phenomenon.
Research how patches of land with illegal mining could look on satellite images (look at images for some known locations with amber mining, use known videos and photos, make interviews with field experts)
Find which map providers have relatively recent satellite images with good resolution. Find examples of places with mining on such images (Mostly it's Bing Maps by Microsoft. It has excellent API, for example it provides useful metadata such as a date for each tile. Due to small characteristic size of digged holes, we needed resolution no less then 1m per pixel).
Find and compile initial set of coordinates for images with traces of mining (we found first places with mining with a huge help from participants of Open Data Day Kyiv )
Split each such tile to superpixels/segments (part of image with approximately homogeneous visual appearance)
Use neural net to extract features for each superpixel (we used pretrained, vanilla ResNet50 from Keras library)
Create labelled set of superpixels for binary classificator (split images on two sets - with traces of amber mining, and without such traces)
Create machine model to classify superpixels (XGBoost was choosed due to best performance, after several tries)
Apply steps 4,5 and classifier from step 7 to each superpixel/segment for all images from region of interest. If more then 2 superpixels classified as positive, mark current image as area of mining (We processed approximately 450,000 images from region with total area about 70,000 km^2, total computation time was ~100 hours on one computer with two GeForce GTX 960 onboard)
Create interactive map with places found by our method ( Most active period of mining was during 2014-2016 years. Most tiles from maps dated by 2015. We found more then 1,000 hectares of damaged land)
Result: we present most informative, as for this moment, interactive map of impact on environment due to illegal amber mining in Ukraine (this article is in English)
- Step 1: How to split map tile to "superpixels"
- Step 2. How to create features for image. Model training & testing
- Step 3: Detect places with amber mining
- Add. 1: Debug classifier, with interactive scatterplot
Sources of inspiration
At first we were thinking about such (amber related) project back in 2016, during work on deforestation in Karpathian mountains. But in that time we didn't have relatively recent satellite images with resolution big enough.
I'd like to say couple of kind words for Terrapattern, amazing project which proved that similar analysis of satellite images is possible.
For the article, we used a novel way to make a transition between main visual elements --- fragments of satellite images --- and interactive map with the same images. You should see it by yourself :)
- Project was nominated by Prix Europa media competition (Online category) in September 2018
- Paper about project's methodology was presented at Computation+Journalism Symposium 2019, Miami. Text of presentation, link to article (in pdf format)
- Bronze medal on SND's Best of Digital Design competition, 2018