In [None]:
pip install sphinx

navigate to ECS_demo/

touch core.py

cut and paste all that code stuff below into core.py

cd ../docs/

sphinx-autogen index.rst

make html

you got an error! : pip install sphinx_rtd_theme

# *pythonic* package development with the trimmings

## Overview

By the grace of open-source-dev there are several free lunches you should know of:

2. [sphinx](http://www.sphinx-doc.org/en/master/)
    1. sphinx can be a bit [finicky](https://samnicholls.net/2016/06/15/how-to-sphinx-readthedocs/). The most important feature to introduce to you to today will be 
    2. [autodocs](https://samnicholls.net/2016/06/15/how-to-sphinx-readthedocs/) where we generate documentation from just your 
    3. [docstrings](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) super cool!
1. [read the docs](https://readthedocs.org/)
    1. not only is it free, but read the docs has a magnicent protocol for turning your hard-earned digital documentation to a pdf. Possibly my favorite feature I'll mention today.
3. [travis CI](https://travis-ci.org/)
    1. "CI" stands for continuous integration. These folks provide you with a free service -- up to 1 hour of CPU time on their servers to run all of your unit tests. 
4. [pypi](https://pypi.org/)
    1. You want people using your code as fast as possible, right? 
5. [coveralls](https://coveralls.io/)
    1. how much of that passed build is covered?!

## Sphinx

1. clone this repo. 

2. cd into the main directory and checkout the directory structure with tree:

```
$ tree
.
├── ECS_demo
│   ├── __init__.py
│   ├── core.py
│   ├── data
│   │   ├── climate_sentiment_m1.h5
│   │   └── tweet_global_warming.csv
│   ├── input.py
│   ├── tests
│   │   ├── __init__.py
│   │   └── test_ECS_demo.py
│   └── version.py
├── LICENSE
├── README.md
├── appveyor.yml
├── docs
│   ├── Makefile
│   ├── _static
│   ├── conf.py
│   ├── index.rst
│   └── source
│       ├── ECS_demo.core.rst
│       ├── ECS_demo.rst
│       └── ECS_demo.tests.rst
├── examples
│   ├── README.ipynb
│   └── README.txt
├── requirements.txt
└── setup.py
```

3. We're about to find out just how busy this directory structure can be with these added open source features. But for now, the main project lives under `ECS_demo/` with `tests/` and `data/` subdirectories.

4. Go ahead and inspect the contents of the core.py and test_ECS_demo.py files, in case you're interested. There's some common elements here in the package development world. `core.py` contains, well, the core code of the package. In a larger package you might have other modules living here such as `analysis.py` or `visualize.py`, depending on how you want to organize your code. For now, the `core.py` file contains four functions: `load_data, data_setup, baseline_model` and one class: `Benchmark`. You can learn more about pythonic naming conventions from the [pep8](https://www.python.org/dev/peps/pep-0008/) documentation.

5. Time to get to Sphinx! cd over to the docs directory. In this tutorial, I've setup the appropriate rst files already. I haven't had excellent luck with using sphinx-quickstart or sphinx-autogen, personally. And so I will always start with a template such as this and modify the `.rst` files as needed. Suffice to say, if you are interested in creating your documentation from scratch I found this [source](https://samnicholls.net/2016/06/15/how-to-sphinx-readthedocs/) helpful.

6. All you need to do is type `make html` in the `docs/` directory where your `Makefile` is sitting. and Sphinx will generate static html documents of your site.

```
$ tree -L 2
.
├── Makefile
├── _build
│   ├── doctrees
│   └── html
├── _static
├── conf.py
├── index.rst
└── source
    ├── ECS_demo.core.rst
    ├── ECS_demo.rst
    └── ECS_demo.tests.rst
```

7. Use your preferred browser to checkout your site: `open _build/html/index.html` 

![](demo1.png)

8. If you navigate 
3. next we'll do autodocs. make sure the autodoc command works. we'll put the html files into a docs/ directory
4. after we have the docs/ directory populated, it's time to upload to readthedocs. 
5. travis ci and coveralls next. these babies take time to build on virtual machines, this will usher in a good demo of creating 'slim' unit tests
6. lastly we'll cover pypi, the dist repo we'll be most concerned with

# Next Steps

1. sdist vs bdist
2. GCP
3. API's (twitter for one)

put this in core.py:

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Convolution1D
from keras.layers.convolutional import MaxPooling1D
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.preprocessing.text import Tokenizer
from keras.utils import np_utils
from keras.models import load_model
from os.path import dirname, join
import sys
import time
import statistics


def load_data(data_file_name, h5File=False):
    """
    Loads data from module_path/data/data_file_name.

    Parameters
    ----------
    data_file_name : string
        name of csv file to be loaded from module_path/data/
        data_file_name.
    h5File : boolean, optional, default = False
        if True opens hdf5 file

    Returns
    -------
    data : Pandas DataFrame
    """
    module_path = dirname(__file__)
    if h5File:
        data = load_model(join(module_path, 'data', data_file_name))
    else:
        with open(join(module_path, 'data', data_file_name), 'rb') as csv_file:
            data = pd.read_csv(csv_file, encoding='latin1')
    return data


def data_setup(top_words=1000, max_words=150):
    """
    preprocesses the twitter climate data. Does things like changes output
    to one hot encoding, performs word embedding/padding
    :return:
    X and Y arrays of data
    """
    data = load_data("tweet_global_warming.csv")
    print("Full dataset: {}".format(data.shape[0]))
    data['existence'].fillna(value='ambiguous',
                             inplace=True)  # replace NA's in existence with "ambiguous"
    data['existence'].replace(('Y', 'N'), ('Yes', 'No'),
                              inplace=True)  # rename so encoder doesnt get confused
    data = data.dropna()  # now drop NA values
    print("dataset without NaN: {}".format(data.shape[0]))
    X = data.iloc[:, 0]
    Y = data.iloc[:, 1]
    print("Number of unique words: {}".format(len(np.unique(np.hstack(X)))))

    # one hot encoding = dummy vars from categorical var
    # Create a one-hot encoded binary matrix
    # N, Y, Ambig
    # 1, 0, 0
    # 0, 1, 0
    # 0, 0, 1

    # encode class as integers
    encoder = LabelEncoder()
    encoder.fit(Y)
    encoded_Y = encoder.transform(Y)

    # convert integers to one hot encoded
    Y = np_utils.to_categorical(encoded_Y)

    # convert X to ints (y is already done)
    token = Tokenizer(num_words=top_words,
                      filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~', lower=True,
                      split=' ', char_level=False, oov_token=None)
    token.fit_on_texts(texts=X)
    X = token.texts_to_sequences(texts=X)
    X = sequence.pad_sequences(X, maxlen=max_words)
    return X, Y

def baseline_model(top_words=1000, max_words=150, filters=32):
    """
    baseline model developed by sarah. so ask her!
    :return:
    model object
    """
    model = Sequential()
    model.add(Embedding(top_words + 1, filters,
                        input_length=max_words))  # is it better to preconvert using word to vec?
    model.add(Convolution1D(filters=filters, kernel_size=3, padding='same',
                            activation='relu'))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Flatten())
    model.add(Dense(250, activation='relu'))
    model.add(Dense(3, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam',
                  metrics=['accuracy'])
    return model

class Benchmark:
    """
    benchmark method used by the unittests
    """
    @staticmethod
    def run(function):
        timings = []
        stdout = sys.stdout
        for i in range(5):
            sys.stdout = None
            startTime = time.time()
            function()
            seconds = time.time() - startTime
            sys.stdout = stdout
            timings.append(seconds)
            mean = statistics.mean(timings)
            print("{} {:3.2f} {:3.2f}".format(
                1 + i, mean,
                statistics.stdev(timings, mean) if i > 1 else 0))

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Put this in core_test.py:

In [None]:
from __future__ import absolute_import, division, print_function
import core
import unittest
from sklearn.model_selection import train_test_split


class testKerasModels(unittest.TestCase):

    def test_baseline_model(self):
        X, Y = core.data_setup()
        model = core.baseline_model()
        X_train, X_test, y_train, y_test = train_test_split(X, Y)

        # Fit the model
        model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=2,
                  batch_size=128, verbose=0)

        # Final evaluation of the model
        scores = model.evaluate(X_test, y_test, verbose=0)
        print("Accuracy: %.2f%%" % (scores[1] * 100))

    def test_benchmark(self):
        core.Benchmark.run(self.test_baseline_model)


if __name__ == '__main__':
    unittest.main()