_Lambda School Data Science_
## Productization Module 3, [Adding Data Science to a Web AppIication](https://github.com/LambdaSchool/DS-Unit-3-Sprint-4-Productization-and-Cloud/blob/master/module3-adding-data-science-to-a-web-application/README.md)

## Today's Plan:

### Templates (provided for you)
- `base.html`
- `prediction.html`
- `user.html`

### Functions (added by you)

#### `twitter.py`
- `add_or_update_user`
- `add_users`
- `update_all_users`

#### `predict.py`
- `predict_user`

#### `app.py`
- ` @app.route('/user/<name>', methods=['GET'])`
- ` @app.route('/user', methods=['POST'])`
- ` @app.route('/compare', methods=['POST'])`
- ` @app.route('/update')`

#### [GET and POST methods, explained](https://developer.mozilla.org/en-US/docs/Web/HTTP/Session#Request_methods)

HTTP defines a set of request methods indicating the desired action to be performed upon a resource. The most common requests are `GET` and `POST`:

- The `GET` method requests a data representation of the specified resource. Requests using `GET` should only retrieve data.
- The `POST` method sends data to a server so it may change its state. This is the method often used for HTML Forms.


# 1. route `/user/<name>`

### Prototype interactively

In [1]:
from mytwit.__init__ import *
from mytwit.twitter import *

type(APP)

flask.app.Flask

In [2]:
from mytwit.__init__ import *
from mytwit.twitter import *

with APP.app_context():
    name = 'Austen'
    tweets = User.query.filter(User.name == name).one().tweets
    for tweet in tweets:
        print(tweet.text)

What % of tax filings could be 100% automated? If the government guessed as to what your return should look like, it’d probably be 100% right... 90% of the time? https://t.co/CSS9KgQv8v
Just hit 24 Lambda School students with job offers so far this month. We're 9 days in.
RT @jeremybrady702: All I have to say is @LambdaSchool has taken my breath away with kindness tonight, thank you soo much!
There is nothing better than talking to recently hired Lambda School grads https://t.co/zRfvxTMFBI
I declare DM bankruptcy again, just FYI. Honestly considering locking it down.
RT @tstock915: “If vocational schools aren’t willing to put their money where their mouth is and offer ISA’s, they’ll probably be dead with…
Here's what few people get about ISAs:

If you don't have good terms, you have negative selection bias, and ISAs won't perform.

To avoid selection bias you need to have both great terms &amp; great outcomes.

If any of that is off it all fails.
RT @Austen: Lambda School is hiring a G

`with APP.app_context` was needed above, because we're running from a notebook instead of inside `flask run` or `flask shell`. For more information, see:

- http://flask-sqlalchemy.pocoo.org/2.3/contexts/
- http://flask.pocoo.org/docs/1.0/appcontext/

### Route in `TwitOff/twitoff/app.py`

Within the `create_app` factory function

```
    @app.route('/user/<name>')
    def user(name):
        tweets = User.query.filter(User.name == name).one().tweets
        return render_template('user.html', title=name, tweets=tweets)
```


### Template at `TwitOff/twitoff/templates/user.html`

`user.html` is like `base.html` except with a for loop iterating over tweets instead of users:

```
        {% for tweet in tweets %}
        <span class="stack">{{ tweet.text }}</span>
        {% endfor %}
```

# 2. Add new user 

### From notebook!

With [tqdm](https://github.com/tqdm/tqdm) for progress bars!

In [3]:
from tqdm.auto import tqdm

In [4]:
from mytwit.__init__ import *
from mytwit.twitter import *

def add_user(username):
    """Add a user and their Tweets"""
    twitter_user = TWITTER.get_user(username)
    db_user = User(id=twitter_user.id, name=username)
    DB.session.add(db_user)
    
    # We want as many recent non-retweet/reply statuses as we can get
    # 200 is a Twitter API limit, we'll usually see less due to exclusions
    tweets = twitter_user.timeline(
        count=200, exclude_replies=True, include_rts=False,
        tweet_mode='extended')
    db_user.newest_tweet_id = tweets[0].id
    
    # tqdm adds progress bar
    for tweet in tqdm(tweets): 
        # Calculate embedding on the full tweet, but truncate for storing
        embedding = BASILICA.embed_sentence(tweet.full_text,
                                            model='twitter')
        db_tweet = Tweet(id=tweet.id, text=tweet.full_text[:300],
                         embedding=embedding)
        db_user.tweets.append(db_tweet)
        DB.session.add(db_tweet)

    DB.session.commit()

In [5]:
type(APP)

flask.app.Flask

In [6]:
with APP.app_context():
    add_user('KenJennings')

HBox(children=(IntProgress(value=0, max=91), HTML(value='')))




## Make it fault-tolerant: add _or update_ user

What if you try to add a user that's already been added? You get a database error:

> IntegrityError: UNIQUE constraint failed: user.id

So, we'll make our function fault-tolerant and "idempotent"!

#### [Idempotent REST APIs](https://restfulapi.net/idempotent-rest-apis/)

> When making multiple identical requests has the same effect as making a single request – then that REST API is called idempotent.

>When you design REST APIs, you must realize that API consumers can make mistakes. They can write client code in such a way that there can be duplicate requests as well. These duplicate requests may be unintentional as well as intentional some time (e.g. due to timeout or network issues). You have to design fault-tolerant APIs in such a way that duplicate requests do not leave the system unstable.

So, instead of assigning `db_user` to a new `User` ...

```
db_user = User(...)
```

We can assign `db_user` to an existing `User` **or** a new `User`:

```
    db_user = (User.query.get(twitter_user.id) or
               User(id=twitter_user.id, name=username))
```

This is a common pattern in web applications. If `User.query.get(twitter_user.id)` returns `None`, that is `False`-y, so then `db_user` is assigned to the new `User(id=twitter_user.id, name=username))` instead.

Here's a simpler demo of how **`or`** works in Python:

In [7]:
1 or 2

1

In [8]:
None or 2

2

And now here's our `add_or_update_user` function:

In [9]:
def add_or_update_user(username):
    """Add or update a user and their Tweets"""
    twitter_user = TWITTER.get_user(username)
    db_user = (User.query.get(twitter_user.id) or
               User(id=twitter_user.id, name=username))
    DB.session.add(db_user)
    
    # We want as many recent non-retweet/reply statuses as we can get
    # 200 is a Twitter API limit, we'll usually see less due to exclusions
    tweets = twitter_user.timeline(
        count=200, exclude_replies=True, include_rts=False,
        tweet_mode='extended', since_id=db_user.newest_tweet_id)
    if tweets:
        db_user.newest_tweet_id = tweets[0].id
        
    # tqdm adds progress bar    
    for tweet in tqdm(tweets):
        # Calculate embedding on the full tweet, but truncate for storing
        embedding = BASILICA.embed_sentence(tweet.full_text,
                                            model='twitter')
        db_tweet = Tweet(id=tweet.id, text=tweet.full_text[:300],
                         embedding=embedding)
        db_user.tweets.append(db_tweet)
        DB.session.add(db_tweet)
        
    DB.session.commit()

Two more changes were made in the function above. 

[Tweepy has a `since_id` parameter:](http://docs.tweepy.org/en/3.7.0/api.html?highlight=since_id)

> `since_id` – Returns only statuses with an ID greater than (that is, more recent than) the specified ID.

We use this parameter so we don't re-retrieve and re-embed tweets we already have in the database. (If `db_user.newest_tweet_id` is `None` then Tweepy gets all the tweets.)

Also, we check whether a user has any tweets before trying to access the id of their 0th tweet. (This will prevent an error if a user doesn't have any tweets.)

```
    if tweets:
        db_user.newest_tweet_id = tweets[0].id
```

Now the function is "idempotent"!

In [10]:
with APP.app_context():
    add_or_update_user('KenJennings')

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))




### We can add more fault-tolerance, with try / except / else blocks

In [11]:
def add_or_update_user(username):
    """Add or update a user and their Tweets, error if not a Twitter user."""
    try:
        twitter_user = TWITTER.get_user(username)
        db_user = (User.query.get(twitter_user.id) or
                   User(id=twitter_user.id, name=username))
        DB.session.add(db_user)
        # We want as many recent non-retweet/reply statuses as we can get
        # 200 is a Twitter API limit, we'll usually see less due to exclusions
        tweets = twitter_user.timeline(
            count=200, exclude_replies=True, include_rts=False,
            tweet_mode='extended', since_id=db_user.newest_tweet_id)
        if tweets:
            db_user.newest_tweet_id = tweets[0].id         
        # tqdm adds progress bar
        for tweet in tqdm(tweets):
            # Calculate embedding on the full tweet, but truncate for storing
            embedding = BASILICA.embed_sentence(tweet.full_text,
                                                model='twitter')
            db_tweet = Tweet(id=tweet.id, text=tweet.full_text[:300],
                             embedding=embedding)
            db_user.tweets.append(db_tweet)
            DB.session.add(db_tweet)
    except Exception as e:
        print('Error processing {}: {}'.format(username, e))
        raise e
    else:
        DB.session.commit()

# 2. Add multiple users

In [12]:
def add_users(users):
    """
    Add/update a list of users (strings of user names).
    May take awhile, so run "offline" (interactive shell).
    """
    # tqdm adds progress bar
    for user in tqdm(users):
        add_or_update_user(user)

In [13]:
users = ['calebhicks', 'SteveMartinToGo', 'sadserver']

with APP.app_context():
    add_users(users)

HBox(children=(IntProgress(value=0, max=3), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))




# 3. Update all users

In [14]:
def update_all_users():
    """Update all Tweets for all Users in the User table."""
    # tqdm adds progress bar
    for user in tqdm(User.query.all()):
        add_or_update_user(user.name)

In [15]:
with APP.app_context():
    update_all_users()

HBox(children=(IntProgress(value=0, max=13), HTML(value='')))

HBox(children=(IntProgress(value=0, max=2), HTML(value='')))

HBox(children=(IntProgress(value=0, max=3), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

HBox(children=(IntProgress(value=0, max=9), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

HBox(children=(IntProgress(value=0, max=2), HTML(value='')))

HBox(children=(IntProgress(value=0, max=12), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))




# ASSIGNMENT

#### Add these functions to your Flask app
- Put `add_or_update_user`, `add_users`, and `update_all_users` in `twitter.py`
- Remove the `tqdm` progress bars from the for loops
- Import the functions in `app.py`

#### Replace your `/user/<name>` route with these routes

```
    @app.route('/user', methods=['POST'])
    @app.route('/user/<name>', methods=['GET'])
    def user(name=None, message=''):
        name = name or request.values['user_name']
        try:
            if request.method == 'POST':
                add_or_update_user(name)
                message = "User {} successfully added!".format(name)
            tweets = User.query.filter(User.name == name).one().tweets
        except Exception as e:
            message = "Error adding {}: {}".format(name, e)
            tweets = []
        return render_template('user.html', title=name, tweets=tweets,
                               message=message)
```

***You will also need to add this import to the top of the file:*** `from flask import request`

#### Add an `/update` route

It should be like the Root route. But first, it should call your function to update all users. And it can display an appropriate title on the page, such as "All tweets updated!"

# 4. Predict!

In [16]:
import numpy as np
from sklearn.linear_model import LogisticRegression

In [17]:
user1_name = 'Austen'
user2_name = 'elonmusk'

In [18]:
with APP.app_context():
    user1 = User.query.filter(User.name == user1_name).one()
    user2 = User.query.filter(User.name == user2_name).one()
    user1_embeddings = np.array([tweet.embedding for tweet in user1.tweets])
    user2_embeddings = np.array([tweet.embedding for tweet in user2.tweets])
    user1_labels = np.ones(len(user1.tweets))
    user2_labels = np.zeros(len(user2.tweets))

In [19]:
user1_embeddings.shape, user2_embeddings.shape, user1_labels.shape, user2_labels.shape

((58, 768), (52, 768), (58,), (52,))

In [20]:
user1_embeddings

array([[-0.260309  , -0.2633    ,  0.569704  , ...,  0.230238  ,
         0.0434244 ,  0.230399  ],
       [-0.367562  ,  0.319384  ,  0.389726  , ...,  0.526667  ,
        -0.00434549, -0.107877  ],
       [ 0.32024   , -0.0549341 ,  0.187368  , ...,  0.625672  ,
        -0.437755  , -0.0250464 ],
       ...,
       [-0.64133   , -0.0183684 ,  0.403406  , ...,  0.738118  ,
         0.142393  ,  0.312049  ],
       [-0.0619171 , -0.219514  ,  0.891416  , ...,  0.661542  ,
         0.500294  ,  0.0749681 ],
       [ 0.065296  , -0.142881  ,  1.01534   , ...,  0.740994  ,
         0.244514  , -0.101911  ]])

In [21]:
user1_labels

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1.])

In [22]:
user2_embeddings

array([[-0.623786 , -0.36228  ,  0.481585 , ...,  0.737665 ,  0.592326 ,
         0.456442 ],
       [-0.544913 , -0.302794 ,  0.668437 , ...,  0.22628  ,  0.291232 ,
         0.234701 ],
       [-0.812237 , -0.223204 ,  0.737502 , ...,  1.15634  ,  0.267936 ,
         0.129276 ],
       ...,
       [-0.233671 , -0.421423 ,  1.15558  , ...,  0.590557 ,  0.704866 ,
        -0.102063 ],
       [-0.280309 , -0.626599 ,  1.00127  , ...,  0.905657 ,  0.838759 ,
        -0.158449 ],
       [-0.0939891, -0.0589629,  0.661137 , ...,  0.867942 ,  0.442749 ,
        -0.166158 ]])

In [23]:
user2_labels

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0.])

In [24]:
embeddings = np.vstack([user1_embeddings, user2_embeddings])
labels = np.concatenate([user1_labels, user2_labels])

embeddings.shape, labels.shape

((110, 768), (110,))

In [25]:
log_reg = LogisticRegression(solver='lbfgs', max_iter=1000)
log_reg.fit(embeddings, labels)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=1000, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='lbfgs',
          tol=0.0001, verbose=0, warm_start=False)

In [26]:
log_reg.score(embeddings, labels)

1.0

In [27]:
from sklearn.model_selection import cross_val_score
cross_val_score(log_reg, embeddings, labels, cv=3)

array([0.94736842, 0.77777778, 0.77777778])

In [28]:
tweet_text = 'Income Share Agreements align incentives. Welcome to the future of education.'
tweet_embedding = BASILICA.embed_sentence(tweet_text, model='twitter')
log_reg.predict(np.array(tweet_embedding).reshape(1, -1))

array([1.])

In [30]:
# ^^^^^^ Austen

In [31]:
BASILICA

<basilica.Connection at 0x10ae6b6d8>

In [32]:
log_reg.predict_proba(np.array(tweet_embedding).reshape(1, -1))

array([[0.00728929, 0.99271071]])

In [33]:
tweet_text = 'SpaceX will launch another Tesla into orbit'
tweet_embedding = BASILICA.embed_sentence(tweet_text, model='twitter')
log_reg.predict(np.array(tweet_embedding).reshape(1, -1))

array([0.])

In [None]:
# ^^^^^^ Musk

In [34]:
log_reg.predict_proba(np.array(tweet_embedding).reshape(1, -1))

array([[0.98649962, 0.01350038]])

In [35]:
tweet_text = 'Today we launch a new initiative'
tweet_embedding = BASILICA.embed_sentence(tweet_text, model='twitter')
log_reg.predict_proba(np.array(tweet_embedding).reshape(1, -1))

array([[0.07898372, 0.92101628]])

In [37]:
np.array(tweet_embedding).shape

(768,)

In [36]:
np.array([tweet_embedding]).shape

(1, 768)

# ASSIGNMENT

### Create `TwitOff/twitoff/predict.py`

Refactor the notebook code into a function, named `predict_user`.

The code you need is already here. You just need to put it in a function in a `.py` file.

The function should take three strings as parameters:
- User 1 name
- User 2 name
- Tweet text

The function should determine and return which user is more likely to say a given tweet. (`return log_reg.predict(...)`)

Import what you need from `numpy`, `sklearn`, and your `.models` and `.twitter` modules.

### Add this `/compare` route

```
    @app.route('/compare', methods=['POST'])
    def compare(message=''):
        user1, user2 = sorted([request.values['user1'],
                               request.values['user2']])
        if user1 == user2:
            message = 'Cannot compare a user to themselves!'
        else:
            prediction = predict_user(user1, user2, request.values['tweet_text'])
            message = '"{}" is more likely to be said by {} than {}'.format(
                request.values['tweet_text'], user1 if prediction else user2,
                user2 if prediction else user1)
        return render_template('prediction.html', title='Prediction', message=message)
```

***You will also need to add this import to the top of the file:*** `from .predict import predict_user`