Skip to content

Big data tools to handle various cryospheric remote sensing datasets, mostly in python.

License

Notifications You must be signed in to change notification settings

weiji14/cryospheric-data-lakes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cryospheric Data Lakes

License: Open Data Commons Attribution License: LGPL v3 License: CC BY-SA 4.0

Open-source big data tools to handle various cryospheric remote sensing datasets.

... a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms, usually object blobs or files... ~Wikipedia

Contents

Find the underlying data here used in this project (or at least links to the sources since they might be too big).

Examine the code here which mingles with the data to give some (hopefully) nice scientifically meaningful outputs (whatever that means). You may find some interesting dockerfiles and python3 code inside (if that clicks with you).

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Pre-requisites

You have some form of git installed for version control. Ideally, docker should be installed too to fully replicate this scientific development environment, unless you do not have root/admin privilleges. For conda users, you may skip the docker install, but take note of the section below on setting up a conda environment.

For Debian/Ubuntu-based systems, you can try something like:

sudo apt install git docker-ce

Note: You may need to set-up the repository first to install docker-ce. See instructions for Debian and Ubuntu.

For Windows, if you have chocolatey (recommended!), it can be as easy as:

choco install git docker

For Mac OS X:

TODO??

Cloning the repository

With git installed, fire up your command prompt and do a git clone from this repo-url:

git clone <repo-url>

Alternatively, download the zip file from here, and unzip it.

The standard clone code above will skip over some submodules, such as external tutorials I have cloned into the tuts folder. To get absolutely everything (beware beware!), you can do:

git clone --recursive <repo-url>

Setup conda environment (for Anaconda/Miniconda users)

You can replicate most of the libraries used in this repository by running:

conda env create --file=environment.yml

Running the code

To try out the code (that downloads big data files, processes the data, etc) you can use a Jupyter lab or notebook environment. Do so by running either one of the below:

jupyter lab
jupyter notebook

Alternatively, you can use the atom-hydrogen-beta docker container here to ensure ease of reproducibility (aka mitigate denpendency hell problems). Yes, I like to do my code writing and execution inside that 'atom' docker container with interactive Hydrogen functionality!!

atom-demo-10

But of course, you can install the libraries yourself.

Contributing

Feel free to submit a pull request or issue (nice ways of saying hi!) if you'd like to see something in here that's not here yet.

License

Data

Any raw data (e.g. binary satellite files) used here is licensed accordingly as per the upstream source. Derived datasets are licensed under the Open Data Commons Attribution license unless otherwise stated.

Code

Source code used in the handling of the data is licensed under the GNU Lesser General Public License v3.0.

Other

Other forms of content (such as documentation) in this project repository which is not covered by the above two licenses is licensed under the Creative Commons Attribution Share Alike 4.0 License. Linked submodules (e.g. in the tuts folder) are subjected to their respective upstream licenses.