# Brewery Tutorial
__[Open Brewery](https://www.openbrewerydb.org)__ DB is a free dataset and API with public information on breweries, cideries, brewpubs, and bottleshops. The goal of Open Brewery DB is to maintain an open-source, community-driven dataset and provide a public API. Datasets provided by the project are available in the following formats:
- __[CSV](https://github.com/openbrewerydb/openbrewerydb/blob/master/breweries.csv)__
- __[JSON](https://github.com/openbrewerydb/openbrewerydb/blob/master/breweries.json)__
- __[PostgreSQL SQL](https://github.com/openbrewerydb/openbrewerydb/blob/master/breweries.sql)__

For this tutorial, CSV will be used.

TerminusDB Server must be installed on your system before running the Python script. Follow the instructions on __[terminusdb-bootstrap](https://github.com/terminusdb/terminusdb-bootstrap)__. terminusdb-server will be running as a Docker container on http://127.0.0.1:6363.

Python client of TerminusDB is also required. It can be installed from source or through `pip`, you can follow the instructions in the __[repository](https://github.com/terminusdb/terminusdb-client-python)__. When running `pip install terminusdb-client[dataframe]` you will get pandas that is required for reading and import data from CSV files. If installed from source, run `pip install pandas`. tdqm is used for adding a progress bar, run `pip install tqdm` to install it.

Columns in the dataset that are not required must be deleted before importing data from CSV.

## Scaffolding
Database and schema creation in TerminusDB can be managed through the scaffolding tool available from the command line.

- Create a new project

In [None]:
terminusdb startproject

This command will create a schema template (̣̣`schema.py`) and configuration file (`config.json`) into current working directory. `schema.py` must be edited and schema definition replaced according to the data imported from the CSV file.

- Update schema

In [None]:
terminusdb commit

Database will be created if not exists and connection established. Schema will be updated in TerminusDB.

- Sync schema

In [None]:
terminusdb sync

`schema.py` will be updated with database schema

- Import CSV

In [None]:
terminusdb importcsv csv_file [options]

Data will be read and imported into TerminusDB. For `NA` values in the Open Brewery dataset, use `--na` option when running `terminusdb importcsv` with `skip` parameter to remove records with `NA`.

In [None]:
termiusdb importcsv breweries.csv --na skip

- Delete database

In [None]:
terminusdb deletedb database_name

## Schema creation
The dataset has the following columns:
- obdb_id
- name
- brewery_type
- street
- address_2
- address_3
- city
- state
- county_province
- postal_code
- website_url
- phone
- created_at
- updated_at
- country
- longitude
- latitude
- tags

Some of which are optional and rarely have a value assigned and can be omitted when creating the schema and importing the values.

Analyzing the dataset:

- A brewery has *name*, *type*, *address*, *phone* and *website url*
- A brewery can be any of eleven different types (micro, nano, regional, brewpub, large, planning, bar, contract, proprietor, closed)
- An address is a group of values that include *street*, *city*, *state*, *country*, *postal code* and *coordinates*
- Coordinates are a pair of values, longitude and latitude

Based on what's described above, the following documents are created, each class represents a document in the schema except Brewery Type that is an enum:
- Brewery
- Brewrey_Type
- Address
- City
- State
- Country
- Coordinates

`schema.py` must be customized according to the documents required and values imported from the CSV file.

Don't forget to commit and sync the schema once changes are made.

In [None]:
####
# This is the script for storing the schema of your TerminusDB
# database for your project.
# Use 'terminusdb commit' to commit changes to the database and
# use 'terminusdb sync' to change this file according to
# the exsisting database schema
####

from typing import List, Optional

from terminusdb_client.woqlschema import DocumentTemplate, EnumTemplate, RandomKey


class Address(DocumentTemplate):
    """This is address"""

    _subdocument = []
    city: "City"
    coordinates: List["Coordinates"]
    country: "Country"
    postal_code: str
    state: Optional["State"]
    street: str


class Brewery(DocumentTemplate):
    _key = RandomKey()
    address_of: "Address"
    name: str
    phone: str
    type_of: "Brewery_Type"
    website_url: str


class Brewery_Type(EnumTemplate):
    micro = ()
    nano = ()
    regional = ()
    brewpub = ()
    large = ()
    planning = ()
    bar = ()
    contract = ()
    proprietor = ()
    closed = ()
    taproom = ()


class City(DocumentTemplate):
    _key = RandomKey()
    name: str


class Coordinates(DocumentTemplate):
    _key = RandomKey()
    latitude: float
    longitude: float


class Country(DocumentTemplate):
    _key = RandomKey()
    name: str


class State(DocumentTemplate):
    _key = RandomKey()
    name: str