# "Wulu Data Analysis"
> "Blog post to summarise our findings through our study using Wulu chatbot for Stanford Longevity Design challenge 2021."

- toc: false
- branch: master
- badges: false
- comments: true
- hide: false
- categories: [fastpages, jupyter]



In [None]:
#hide
import pandas as pd
import numpy as np
import altair as alt
from altair.expr import datum
import json
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

## Wulu Data Analysis

The following post contains the data analysis of the attitude changes the we percieved in our project for the Stanford Longevity design challenge. For a more detailed report on how the project planned, executed and what we learned from it, we will be updating the post with a relevant report soon. 

In [None]:
#hide
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)
!ls '/content/drive/MyDrive/WUL/ProjectH'

Mounted at /content/drive/
basic.csv  trend.csv  TRTAnalysis.csv


In [None]:
#hide
df = pd.read_csv('/content/drive/MyDrive/WUL/ProjectH/basic.csv')
df_trend = pd.read_csv('/content/drive/MyDrive/WUL/ProjectH/trend.csv')
df.drop(['Unnamed: 0'],axis=1, inplace=True)
df_trend.drop(['Unnamed: 0'],axis=1, inplace=True)


In [None]:
#hide
df1 = df[['Age','Gender','baseline score','endline score']]
df11 = pd.melt(df1, id_vars=['Gender','Age'], value_vars=[ 'baseline score', 'endline score'])
df11.head()

Unnamed: 0,Gender,Age,variable,value
0,F,17,baseline score,6
1,F,16,baseline score,10
2,M,18,baseline score,6
3,M,18,baseline score,6
4,F,17,baseline score,19


## Analysis
### Comparison between baseline and endline questionnaire scores using only the best option as the answer

The following chart establishes a relationship between age and score, with gender taken as a differentiating parameter. 
With an increase in age, there appears to be a more susceptibility to a positive change in attitude. It is also evident that this change is reflected more in the girls. 

In [None]:
#hide_input
input_dropdown = alt.binding_radio(options=['baseline score','endline score'])
selection = alt.selection_single(fields=['variable'], bind=input_dropdown, name='Score')
color = alt.condition(selection,
                    alt.Color('Gender:N', legend=None),
                    alt.value('lightgray'))
domain = [12,22]
range = [-5,25]
alt.Chart(df11).mark_circle().encode(
    x=alt.X('Age:Q',scale=alt.Scale(domain=domain)),
    y=alt.Y('value:Q',scale=alt.Scale(domain=range)),
    size = 'count()',
    color='Gender:N',
    tooltip=['Age','Gender','variable','value','count()']
).add_selection(
    selection
).transform_filter(
    selection
).interactive().properties(height=400, width=400, title='Combined charts for baseline and endline scores')

In [None]:
#hide
domain = [12,22]
range = [-5,25]
alt.Chart(df1).mark_circle().encode(
    x=alt.X('Age:Q',scale=alt.Scale(domain=domain)),
    y=alt.Y('baseline score:Q',scale=alt.Scale(domain=range)),
    size = 'count()',
    color='Gender',
    tooltip=['Age','Gender','baseline score','count()']
).interactive().properties(height=400, width=400, title='Chart for best choice baseline scores')

In [None]:
#hide
domain = [12,22]
range = [-5,25]
alt.Chart(df1).mark_circle().encode(
    x=alt.X('Age:Q',scale=alt.Scale(domain=domain)),
    y=alt.Y('endline score:Q',scale=alt.Scale(domain=range)),
    size = 'count()',
    color='Gender',
    tooltip=['Age','Gender','endline score','count()']
).interactive().properties(height=300, width=300, title='Chart for best choice endline scores')

In [None]:
#hide
domain = [12,22]
range = [-5,25]
alt.Chart(df_trend).mark_circle().encode(
    x=alt.X('Age:Q',scale=alt.Scale(domain=domain)),
    y=alt.Y('baseline score:Q',scale=alt.Scale(domain=range)),
    size = 'count()',
    color='Gender',
    tooltip=['Age','Gender','baseline score','count()']
).interactive().properties(height=300, width=300, title='Chart for graded baseline scores')

In [None]:
#hide
domain = [12,22]
range = [-5,25]
alt.Chart(df_trend).mark_circle().encode(
    x=alt.X('Age:Q',scale=alt.Scale(domain=domain)),
    y=alt.Y('endline score:Q',scale=alt.Scale(domain=range)),
    size = 'count()',
    color='Gender',
    tooltip=['Age','Gender','endline score','count()']
).interactive().properties(height=300, width=300, title='Chart for graded endline scores')

In [None]:
#hide
dft = df_trend[['Age','Gender','Father edu', 'Mother edu' ,'baseline score','endline score']]
df11 = pd.melt(df1, id_vars=['Gender','Age'], value_vars=[ 'baseline score', 'endline score'])
df11.head()

In [None]:
#hide
#layered histogram - mother's education levels are the colours - x axis is age - y axis is score