# Ground RISE Camp Tutorial

## Basic Exercises

To get started with Ground, we will use some of the "Aboveground" services that we have already developed. Aboveground services are tools that users use to interface with Ground at a higher semantic level than the simple node-and-edge-based API.

We will begin by using a tool that autopopulates Github repositories into Ground. 

In [None]:
import ground_git_client

REPO_NAME = "ground-context/risecamp"
ground_git_client.add_repo(REPO_NAME)

Now that we have some code that Ground is aware of, we are going to want to do something with code. The particular repository that we populated has some simple Python scripts that are "Ground-aware"\* as well a small amount of data for us to analyze in the form of a CSV file. 

We're going to download that repository locally using the `download_repo` command below. You can find the repo online [here](). We will run a simple script that's going to take our CSV data and split up our currently single-column data into the following fields:

* 1
* 2
* 3

However, before doing that, we need to make sure that Ground knows about the base dataset that we are transforming. Using another Aboveground tool that we have already developed, you can automatically let Ground know about this new dataset. This tool will populate Ground with some useful information about the file including the file type, the size of the file, and the path to the file.

\*When we say that these scripts are Ground-aware, we mean that we have instrumented them to know how to interact with Ground and automatically publish useful data context into Ground in the due course of their execution.

In [None]:
import ground_file_client

FILE_PATH = "./data.csv"
ground_file_client.add_file(FILE_PATH)

Now that Ground knows about our base dataset, we can go about transforming it. Since the scripts that we are using are Ground-aware, they are going to generate lineage information in Ground as a part of transforming the data. It will tell Ground that it's created a new dataset based on the old input dataset, and it will associate this lineage information with the latest version of the source code that was used for the transformation.

In [None]:
# execute the Python script in the repository in the repository cloned above

Now that we've spent a bunch of time populating information into Ground, it's time to see everything we've done. Using the Ground API client, for which you can find complete documentation [here](), determine the following pieces of information:

In [None]:
import ground
gc = GroundClient()

# the id of the node version for the base dataset (hint: you can use the latest API)

# the id of the lineage edge version that connects the base dataset to the derived dataset

# all of the tags  of the derived dataset

## Ground & ML Models

## Extending Ground

In this section, we will walk you through how you might go about extending Ground to populate your own data context. Before we go any further, let's first reset our Ground instance. If you mistakenly add data to Ground as you do this exercise, you can run the following cell to wipe Ground and start over anew:

In [None]:
!cd $GROUND_HOME/resources/scripts/postgres && python2.7 postgres_setup.py ground ground drop

Before we start writing our own Aboveground tool, let's first dig into the the Ground file populator component works a little more. Let's begin by opening the `ground_file_client.py` file in another tab. After walking through the comments there, return here to continue with the exercises.