# Mastering Applied Skills in Management, Analytics and Entrepreneurship

## DATA COLLECTION TECHNIQUES
## Part IX. What to do with the data collected

### 1. About

We have collected some data but what is next? Here we will try to apply a framework that will help us to build a small application based on our data.

[Streamlit](https://streamlit.io/) is a framework that offers a faster way to build and share data applications. It helps you to turn data scripts into shareable web apps in minutes. It is written in pure Python and does not require front‑end experience to work with.

### 2. Installation

Installation is very simple in our environment. Just use terminal or type here:

In [None]:
!pip install streamlit

### 3. How it works

[Main concepts](https://docs.streamlit.io/library/get-started/main-concepts) require you to create a normal Python script with all necessary elements for your future app and run it with `streamlit run` like `streamlit run your_script.py [-- script args]`.

#### 3.1. Python script with app

Streamlit's architecture allows you to write apps the same way you write plain Python scripts. Let's create the sample script with `%%writefile` magic command:

In [None]:
%%writefile stapp.py

import streamlit as st

# Title of our demo app
st.title('Meet the first Streamlit application')

#### 3.2. Run application

Run application is very easy. Just open a terminal and type `streamlit run manutils/stapp.py` or use Jupyter interface like shown below. Note, that we need to find a proper URL to open Streamlit application within Jupyter:

In [None]:
# better use port's value in range
# from 1000 (or even 10000) to 64000

PORT = 20000

In [None]:
!streamlit run stapp.py --server.port {PORT} --browser.gatherUsageStats False

We are in our oun JupyterHub environment and the urls above will not work. To get the application's interface we need some trick:

In [None]:
import os

In [None]:
print('Streamlit available at:',
      'https://jhas01.gsom.spbu.ru{}proxy/{}/'.format(
          os.environ['JUPYTERHUB_SERVICE_PREFIX'], PORT))

!streamlit run stapp.py --server.port {PORT} --browser.gatherUsageStats False

### 4. Basic examples

#### 4.1. Nice headers

In [None]:
%%writefile stapp.py

import streamlit as st

st.header('Nice looking header string', divider='rainbow')
st.header('_Here is header under the line_ :fire:')

st.subheader('Subheader is also here', divider='rainbow')
st.subheader(':blue[_We like Streamlit_] :star:')

#### 4.2. Text

In [None]:
%%writefile stapp.py

import streamlit as st

st.header('Just header', divider='rainbow')
st.text('Just text under the header')

#### 4.3. Write

Along with magic commands, `st.write()` is Streamlit's "Swiss Army knife". You can pass almost anything to `st.write()`: text, data, Matplotlib figures, charts and more.

In [None]:
%%writefile stapp.py

import streamlit as st
import pandas as pd

st.header('Demo of write function', divider='rainbow')

st.write("Here's demo table from the dataframe:")
fruits_data = pd.DataFrame(
    {
        'fruits': ['apple', 'peach', 'pineapple', 'watermelon'],
        'color': ['green', 'orange', 'yellow', 'stripes'],
        'weight': [1, 2, 5, 10]
    }
)
st.write(fruits_data)

In [None]:
%%writefile stapp.py

import streamlit as st
import numpy as np
import pandas as pd

st.header('Demo of write function', divider='rainbow')
st.subheader('Table and plot at one application')

st.divider()

st.write("Here's demo table from the dataframe:")
fruits_data = pd.DataFrame(
    {
        'fruits': ['apple', 'peach', 'pineapple', 'watermelon'],
        'color': ['green', 'orange', 'yellow', 'stripes'],
        'weight': [1, 2, 5, 10]
    }
)
st.write(fruits_data)

st.divider()

st.write("Here's demo chart for fruits:")
chart_data = pd.DataFrame(
     np.random.randn(20, 4),
     columns=['apple', 'peach', 'pineapple', 'watermelon']
)
st.line_chart(chart_data)

#### 4.4. Map

In [None]:
import os
import json
import time
import requests
import numpy as np
import pandas as pd
from datetime import datetime

In [None]:
api_url_loc = 'http://api.open-notify.org/iss-now.json'
n_points = 10
positions = []

In [None]:
for p in range(n_points):
    response = requests.get(api_url_loc)
    d = response.json()
    uts = d['timestamp']
    d['timestamp'] = datetime.utcfromtimestamp(uts).strftime('%Y-%m-%d %H:%M:%S')
    positions.append(d)
    print(p, 'uts time collected:', d['timestamp'])
    time.sleep(1)

In [None]:
coords = [
    (float(x['iss_position']['latitude']), float(x['iss_position']['longitude']))
    for x in positions
]
coords

In [None]:
df = pd.DataFrame(
    coords,
    columns=['lat', 'lon']
)
df.head()

In [None]:
%%writefile stapp.py

import os
import json
import time
import requests
import numpy as np
import pandas as pd
from datetime import datetime
import streamlit as st

st.header('ISS over Earth', divider='rainbow')

st.write('Demo of ISS coordinates over the time on a map:')
         
api_url_loc = 'http://api.open-notify.org/iss-now.json'
n_points = 10
positions = []
for p in range(n_points):
    response = requests.get(api_url_loc)
    d = response.json()
    uts = d['timestamp']
    d['timestamp'] = datetime.utcfromtimestamp(uts).strftime('%Y-%m-%d %H:%M:%S')
    positions.append(d)
    time.sleep(1)
coords = [
    (float(x['iss_position']['latitude']), float(x['iss_position']['longitude']))
    for x in positions
]
map_data = pd.DataFrame(
    coords,
    columns=['lat', 'lon']
)
st.map(map_data)

## <font color='red'>INTERMEDIATE QUIZ #9-1</font>
Take the code below and add a table with ISS crew members before map with ISS trajectory.

#### HINT

In [None]:
%%writefile stapp.py

import os
import json
import time
import requests
import numpy as np
import pandas as pd
from datetime import datetime
import streamlit as st

st.header('ISS over Earth', divider='rainbow')

st.write('ISS crew:')
api_url_crew = 'http://api.open-notify.org/astros.json'
################
# YOUR CODE HERE
################

st.divider()

st.write('Demo of ISS coordinates over the time on a map:')

api_url_loc = 'http://api.open-notify.org/iss-now.json'
n_points = 10
positions = []
for p in range(n_points):
    response = requests.get(api_url_loc)
    d = response.json()
    uts = d['timestamp']
    d['timestamp'] = datetime.utcfromtimestamp(uts).strftime('%Y-%m-%d %H:%M:%S')
    positions.append(d)
    time.sleep(1)
coords = [
    (float(x['iss_position']['latitude']), float(x['iss_position']['longitude']))
    for x in positions
]
map_data = pd.DataFrame(
    coords,
    columns=['lat', 'lon']
)
st.map(map_data)

### 5. Advanced tools

#### 5.1. Chat

In [None]:
%%writefile stapp.py

import streamlit as st

name = st.chat_input('What is ypour name?')
if name:
    st.write(f'Hello, {name}!')
else:
    st.write('My name is Streamlit.')

#### 5.2. Censor chat

In [None]:
%%writefile stapp.py

import re
import streamlit as st

st.header('Chat that hates f-words', divider='rainbow')

def repl(m): 
    return '<CENZORED(' + str(len(m[0])) + ')>' 

# Initialize chat history with help of `st.session_state`
# https://docs.streamlit.io/library/api-reference/session-state
if 'messages' not in st.session_state:
    st.session_state.messages = []

# Display chat messages from history on app rerun
for message in st.session_state.messages:
    with st.chat_message(message['role']):
        st.markdown(message['content'])

# React to user input
# NOTE walrus operator in Python
# https://docs.python.org/3/whatsnew/3.8.html
if msg := st.chat_input('Enter your message'):
    # Display user's message in chat message container
    st.chat_message('user').markdown(msg)
    # Add user's message to chat history
    st.session_state.messages.append({'role': 'user', 'content': msg})
    
    # Censor filter with RE
    msg = re.sub(r'\b[fF]\w*', repl, msg)
    
    answer = f'Censored: {msg}'
    # Display assistant response in chat message container
    with st.chat_message('assistant'):
        st.markdown(answer)
    # Add assistant response to chat history
    st.session_state.messages.append({'role': 'assistant', 'content': answer})

#### 5.3. Input

In [None]:
%%writefile stapp.py

import streamlit as st
import numpy as np
import pandas as pd

df = pd.DataFrame(
    {
        'fruits': ['apple', 'peach', 'pineapple', 'watermelon'],
        'weight': [1, 2, 5, 10]
    }
)
df = df.set_index('fruits')
st.write(df)
st.bar_chart(df)

In [None]:
print('Streamlit available at:',
      'https://jhas01.gsom.spbu.ru{}proxy/{}/'.format(
          os.environ['JUPYTERHUB_SERVICE_PREFIX'], PORT))

!streamlit run stapp.py --server.port {PORT} --browser.gatherUsageStats False

##### 5.3.1. Site analyzer

In [None]:
%%writefile stapp.py

import streamlit as st

st.header('Internet sites word analyzer', divider='rainbow')

url = st.text_input('Input URL', '')
if url:
    st.write('The URL for analysis is', url)

In [None]:
%%writefile stapp.py

import os
import re
import pandas as pd
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
from collections import Counter
import streamlit as st

st.header('Internet sites word analyzer', divider='rainbow')

url = st.text_input('Input URL', '')
if url:
    st.write('The URL for analysis is', url)
    
    request = Request(url)
    response = urlopen(request)
    html = response.read()
    soup = BeautifulSoup(html, 'html.parser')
    
    text = soup.text
    for ch in ['\n', '\t', '\r']:
        text = text.replace(ch, ' ')
    text = re.sub('[^а-яА-Яa-zA-Z]+', ' ', text).strip().lower()
    
    freqs = dict(Counter(text.split()))
    freqs = dict(sorted(
        freqs.items(), 
        key=lambda item: item[1], 
        reverse=True
    ))
    limit = 10
    freqs = [(k, v) for k, v in freqs.items() if v >= limit]
    df = pd.DataFrame(
        freqs,
        columns=['word', 'count']
    )
    
    st.divider()
    st.write(df)

##### 5.3.2. Site analyzer with limit and more pythonic

In [None]:
%%writefile stapp.py

import os
import re
import pandas as pd
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
from collections import Counter
import streamlit as st

def soup_from_url(url):
    request = Request(url)
    response = urlopen(request)
    html = response.read()
    soup = BeautifulSoup(html, 'html.parser')
    return soup

def text_proc(soup):
    text = soup.text
    for ch in ['\n', '\t', '\r']:
        text = text.replace(ch, ' ')
    text = re.sub('[^а-яА-Яa-zA-Z]+', ' ', text).strip().lower()
    return text

def df_lim(text, limit=1):
    freqs = dict(Counter(text.split()))
    freqs = dict(sorted(
        freqs.items(), 
        key=lambda item: item[1], 
        reverse=True
    ))
    freqs = [(k, v) for k, v in freqs.items() if v >= limit]
    df = pd.DataFrame(
        freqs,
        columns=['word', 'count']
    )
    return df

st.header('Internet sites word analyzer', divider='rainbow')

url = st.text_input('Input URL', '')
if url:
    st.write('The URL for analysis is', url)
    limit = st.slider('Select lower word count limit', 0, 100, 1)
    
    # processing part
    soup = soup_from_url(url)
    text = text_proc(soup)
    df = df_lim(text, limit)
    
    # output part
    st.divider()
    st.write(df)

##### 5.3.3. Site analyzer with diagram

In [None]:
%%writefile stapp.py

import os
import re
import pandas as pd
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
from collections import Counter
import streamlit as st

def soup_from_url(url):
    request = Request(url)
    response = urlopen(request)
    html = response.read()
    soup = BeautifulSoup(html, 'html.parser')
    return soup

def text_proc(soup):
    text = soup.text
    for ch in ['\n', '\t', '\r']:
        text = text.replace(ch, ' ')
    text = re.sub('[^а-яА-Яa-zA-Z]+', ' ', text).strip().lower()
    return text

def df_lim(text, limit=1):
    freqs = dict(Counter(text.split()))
    freqs = dict(sorted(
        freqs.items(), 
        key=lambda item: item[1], 
        reverse=True
    ))
    freqs = [(k, v) for k, v in freqs.items() if v >= limit]
    df = pd.DataFrame(
        freqs,
        columns=['word', 'count']
    )
    df = df.set_index('word')
    return df

st.header('Internet sites word analyzer', divider='rainbow')

url = st.text_input('Input URL', '')
if url:
    st.write('The URL for analysis is', url)
    limit = st.slider('Select lower word count limit', 0, 100, 1)
    
    # processing part
    soup = soup_from_url(url)
    text = text_proc(soup)
    df = df_lim(text, limit)
    
    # table output part
    st.divider()
    st.write('Top-5 words from the site')
    st.write(df.head(5))
    
    # plot output part
    st.divider()
    st.write('Words from the site')
    st.bar_chart(df)

## <font color='red'>LAB WORK #6</font>

We built a nice analyzer but it seems that we need to make upper limit for words count as well. So, your home assignment will be to update code with the lower limit and get the new Streamlit app.

#### HINT

In [None]:
%%writefile stapp.py

import os
import re
import pandas as pd
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
from collections import Counter
import streamlit as st

low_limit, upp_limit = st.slider(
    'Select limits',
    0, 100, (10, 90)
)
st.write('Lower limit:', low_limit)
st.write('Lower limit:', upp_limit)

In [None]:
# you will also need to change a function `df_lim`
# for example like that

def df_lim(text, low_limit, upp_limit):
    freqs = dict(Counter(text.split()))
    freqs = dict(sorted(
        freqs.items(), 
        key=lambda item: item[1], 
        reverse=True
    ))
    freqs = ### YOUR CODE HERE ####
    df = pd.DataFrame(
        freqs,
        columns=['word', 'count']
    )
    df = df.set_index('word')
    return df