## Basic Exercises

To get started with Ground, we will use some of the "Aboveground" services that we have already developed. Aboveground services are tools that users use to interface with Ground at a higher semantic level than the simple node-and-edge-based API.

We will begin by using a tool that autopopulates Github repositories into Ground. 

In [None]:
from aboveground import ground_git_client

REPO_NAME = "ground-context/risecamp"
ground_git_client.add_repo(REPO_NAME)

Now that we have some code that Ground is aware of, we are going to want to do something with code. The particular repository that we populated has some simple Python scripts that are "Ground-aware"\* as well a small amount of data for us to analyze in the form of a CSV file. 

We're going to download that repository locally using the `download_repo` command below. You can find the repo online [here](). We will run a simple script that's going to take our CSV data and split up our currently single-column data into three columns of type `int`, `string`, and `int`.

However, before doing that, we need to make sure that Ground knows about the base dataset that we are transforming. Using another Aboveground tool that we have already developed, you can automatically let Ground know about this new dataset. This tool will populate Ground with some useful information about the file including the file type, the size of the file, and the path to the file.

\*When we say that these scripts are Ground-aware, we mean that we have instrumented them to know how to interact with Ground and automatically publish useful data context into Ground in the due course of their execution.

In [None]:
from aboveground import ground_file_client

FILE_PATH = "repo/data.txt"
ground_file_client.add_file(FILE_PATH)

Now that Ground knows about our base dataset, we can go about transforming it. Since the scripts that we are using are Ground-aware, they are going to generate lineage information in Ground as a part of transforming the data. It will tell Ground that it's created a new dataset based on the old input dataset, and it will associate this lineage information with the latest version of the source code that was used for the transformation.

This step will take a minute to run because it is going to scan through a lot of data.

In [None]:
# execute the Python script in the repository in the repository cloned above
!cd repo && python column_splitter.py

Now that we've spent a bunch of time populating information into Ground, it's time to see everything we've done. Using the Ground API client, for which you can find complete documentation [here]()(***TODO: add link to documentation and help command to the GroundClient***), determine the following pieces of information:

In [None]:
from aboveground.ground import GroundClient
gc = GroundClient()

# the id of the node version for the base dataset (hint: you can use the latest API) -- get_node_latest(node_key)

# the lineage edge and version conencting the two datasets
# for now the key is "data.txt_to_split_data.csv" -- get_lineage_edge_latest(le_key) (to be changed for final version)

# all of the tags of the derived dataset -- get_node_version(id)

The full solution is provided [here]()(***TODO***: Add link to notebook with full solution.).