[A Netflix dataset](https://www.kaggle.com/shivamb/netflix-shows), available as CSV file, will be imported into TerminusDB using the Python client. Instructions to install Python client can be found in the [repository](https://github.com/terminusdb/terminusdb-client-python).

## Importing libraries
Required libraries must be imported at first, including:

- TerminusDB (Python client)
- pandas
- tqdm
- tempfile
- random

In [None]:
from typing import  Set, Optional
from terminusdb_client import Client
from terminusdb_client.woqlschema.woql_schema import (
    DocumentTemplate,
    EnumTemplate,
    WOQLSchema,
    LexicalKey,
)

import pandas as pd
from tqdm import tqdm
import tempfile
import random

## Schema definition
Once columns in the dataset are identified, schema must be created based on that information. Netflix dataset contains the following columns:

- title
- type
- director
- cast
- country
- release_year
- rating
- duration
- listed_in
- description
- date_added

From which there would be one main class, `Content`, `User` class, and two Enums, `Content_Type` and `Rating`.

In [None]:

schema = WOQLSchema()

class Content(DocumentTemplate):
    _schema = schema
    title: str
    type_of: "Content_Type"
    director: Optional[str]
    cast: Optional[str]
    country_of_origin: Optional[str]
    release_year: int
    rating: "Rating"
    duration: str
    listed_in: str
    description: str
    date_added: Optional[str]

class User(DocumentTemplate):
    _schema = schema
    _key = LexicalKey(keys="id")
    _base = "User"
    id : str
    watched_contents: Set["Content"]

class Content_Type(EnumTemplate):
    _schema = schema
    TV_Show = "TV Show"
    Movie = "Movie"

class Rating(EnumTemplate):
    _schema = schema
    TV_MA = "TV-MA"
    R = ()
    PG_13 = "PG-13"
    TV_14 = "TV-14"
    TV_PG = "TV-PG"
    NR = ()
    TV_G = "TV-G"
    TV_Y = "TV-Y"
    TV_Y7 = "TV-Y7"
    TY = ()
    TY_7 = "TY-7"
    PG = ()
    G = ()
    NC_17 = "NC-17"
    TV_Y7_FV = "TV-Y7-FV"
    UR = ()


## Reading and importing data
Dataset will be read using `pandas` and inserted into TerminusDB by calling the `insert_content_data`, `insert_user_data` fucntions. To avoid `Connection Timed Out` errors, dataset will be read in chunks. Every chunk will be processed individually through the `read_data` function, where some additional validations will be made before importing the data.

In [None]:
def insert_content_data(client, url):
    df = pd.read_csv(url, chunksize=1000)
    for chunk in tqdm(df, desc='Transfering data'):
        csv = tempfile.NamedTemporaryFile()
        chunk.to_csv(csv)
        netflix_content = read_data(csv.name)
        client.insert_document(netflix_content, commit_msg="Adding all Netflix content")

# We will generate and insert random 50 users using following function
def insert_user_data(contents):
    users = []
    for i in range(0,50):
        randomlist = random.sample(range(1, 50), i%10)
        watched_contents = set()
        for index in randomlist:
            watched_contents.add(schema.import_objects(contents[index]))

        users.append(User(id=str(i), watched_contents = watched_contents))

    client.insert_document(users, commit_msg="Adding users")

def read_data(csv):
    records = []
    df = pd.read_csv(csv)
    for index, row in df.iterrows():

        type_of = row['type'].replace(" ", "_")
        rating = "NR" if pd.isna(row['rating']) else row['rating'].replace("-", "_")

        records.append(Content(title=row['title'], type_of=Content_Type[type_of], director=str(row['director']), cast=str(row['cast']), country=str(row['country']), release_year=row['release_year'], rating=Rating[rating], duration=row['duration'], listed_in=row['listed_in'], description=row['description'], date_added=str(row['date_added'])))

    return records

## Database connection
You must established a connection to either a local instance of TerminusDB Server (running at http://127.0.0.1:6363) or a TerminusX account, then create a database named `Netflix`. The schema defined above is inserted into TerminusDB by calling the `insert_document` method defined in the Python client. Finally, `insert data` function is called and first 10 records of the `Netflix` database are printed. The `team` variable refers to the team associated with your TerminusX account, the value must be replaced accordingly. An API key is required to use TerminusX. Follow instructions [here](https://docs.terminusdb.com/v10.0/#/terminusx/get-your-api-key) to get your API key. Don't forget to set the `TERMINUSDB_ACCESS_TOKEN` environment variable and assign your API key as value.

In [None]:
if __name__ == "__main__":
    db_id = "Netflix"
    url = "netflix.csv"

    # TODO: change the team name 
    team = "TeamName"
    client = Client("https://cloud.terminusdb.com/"+team)
    
    try:
        client.connect(team=team, use_token=True)
        client.create_database(db_id, label = "Netflix Graph", description = "Create a graph with Netflix data")
    except Exception:
        client.connect(db=db_id, team=team, use_token=True)

    schema.commit(client, commit_msg = "Adding Netflix Schema")
    
    insert_content_data(client, url)

    contents = client.query_document({"@type"  : "Content"}, count=50)

    insert_user_data(list(contents))

## Query documents
Get all documents:

In [None]:
documents = client.get_all_documents()

# documents comes back as a iterable that can be convert into a list
print("All documents")
print(list(documents))

Get a specific document using `query_document`:

In [None]:
matches = client.query_document({"@type"  : "Content",
                                 "type_of": "Movie",
                                 "release_year": "2020"})

# matches comes back as a iterable that can be convert into a list
print(list(matches))

If you want to get a specific number of records, just add `count=number` when calling both functions:

In [None]:
documents = client.get_all_documents(count=10)

matches = client.query_document({"@type"  : "Content",
                                 "type_of": "Movie",
                                 "release_year": "2020"}, count=10)

## Commit history
Get the whole commit history:

Check the [documentation](https://terminusdb.github.io/terminusdb-client-python/woqlClient.html) for more information.

In [None]:
client.get_commit_history()

## Branches
`main` is the default branch when you create a new TerminusDB database. You can manage branches in your database with the Python client and run any of the following tasks:
- Create branch
- Delete branch
- List branches

### Create branch
You can create a new branch by calling the `create_branch` method, passing the name of the new branch and the `empty` variable as parameters. When `empty` is set to `False`, a new branch will be created, containing the schema and data inserted into the database previously. If set to `True`, an empty branch will be created.

In [None]:
client.create_branch("some_branch", empty=False)

client.create_branch("some_branch", empty=True)

### Delete branch
You can delete a branch by calling the `delete` and passing the name of the branch as parameter.

In [None]:
client.delete_branch("some_branch")

### List branches
If you want to get a list of the branches, call the `get_all_branches` method. This method will return a list with details of the branches in your database.

In [None]:
branches = client.get_all_branches()

print(branches)

# Output:
# [{'@id': 'Branch/main', '@type': 'Branch', 'head': 'ValidCommit/ofxzh4i6jb9arf0nx5nicffhgcqxjco', 'name': 'main'}]

## Time Travel

You can reset a branch to a particular commit, squash a branch and rebase using following functions

### Reset to commit
Reset the current branch HEAD to the specified commit path. 

In [None]:
client.reset('hvatquoq9531k1u223v4azcdr1bfyde')

### Squash
Squash the current branch HEAD into a commit.

In [None]:
commit_res = client.squash('This is a squash commit message!',"username")

# reset to the squash commit 
client.reset(commit_res['api:commit'],use_path=True)

### Rebase
Rebase the current branch onto the specified remote branch

In [None]:
client.rebase("main")

Check the [documentation](https://terminusdb.github.io/terminusdb-client-python/woqlClient.html) for more information.