# Analyzing global internet patterns

## 📖 Background
In this competition, you'll be exploring a dataset that highlights internet usage for different countries from 2000 to 2023. Your goal is import, clean, analyze and visualize the data in your preferred tool.

The end goal will be a clean, self explanatory, and interactive visualization. By conducting a thorough analysis, you'll dive deeper into how internet usage has changed over time and the countries still widely impacted by lack of internet availability. 


## 💾 Data

#### You have access to the following file, but you can supplement your data with other sources to enrich your analysis. 

### Interet Usage (`internet_usage.csv`)
|   Column name  |   Description | 
|---------------|-----------|
| Country Name | Name of the country |
| Country Code | Countries 3 character country code|
| 2000 | Contains the % of population of individuals using the internet in 2000  |
| 2001 | Contains the % of population of individuals using the internet in 2001  |
| 2002 | Contains the % of population of individuals using the internet in 2002  |
| 2003 | Contains the % of population of individuals using the internet in 2003  |
| .... | ...  |
| 2023 | Contains the % of population of individuals using the internet in 2023  |

**The data can be downloaded from the _Files_ section (_File_ > _Show workbook files_).**

## 💪 Challenge
Use a tool of your choice to create an interesting visual or dashboard that summarizes your analysis! 

Things to consider:
1. Use this Workspace to prepare your data (optional).
2. Stuck on where to start, here's some ideas to get you started: 
    - Visualize interner usage over time, by country 
    - How has internet usage changed over time, are there any patterns emerging? 
    - Consider bringing in other data to supplement your analysis 
3. Create a screenshot of your main dashboard / visuals, and paste in the designated field. 
4. Summarize your findings in an executive summary.

In [7]:
import pandas as pd
data = pd.read_csv("data/internet_usage.csv") 
data.head(10)

Unnamed: 0,Country Name,Country Code,2000,2001,2002,2003,2004,2005,2006,2007,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
0,Afghanistan,AFG,..,0.00472257,0.0045614,0.0878913,0.105809,1.22415,2.10712,1.9,...,7,8.26,11,13.5,16.8,17.6,18.4,..,..,..
1,Albania,ALB,0.114097,0.325798,0.390081,0.9719,2.42039,6.04389,9.60999,15.0361,...,54.3,56.9,59.6,62.4,65.4,68.5504,72.2377,79.3237,82.6137,83.1356
2,Algeria,DZA,0.491706,0.646114,1.59164,2.19536,4.63448,5.84394,7.37598,9.45119,...,29.5,38.2,42.9455,47.6911,49.0385,58.9776,60.6534,66.2356,71.2432,..
3,American Samoa,ASM,..,..,..,..,..,..,..,..,...,..,..,..,..,..,..,..,..,..,..
4,Andorra,AND,10.5388,..,11.2605,13.5464,26.838,37.6058,48.9368,70.87,...,86.1,87.9,89.7,91.5675,..,90.7187,93.2056,93.8975,94.4855,..
5,Angola,AGO,0.105046,0.136014,0.270377,0.370682,0.464815,1.14337,1.5,1.7,...,21.3623,22,23.2,26,29,32.1294,36.6347,37.8067,39.2935,..
6,Antigua and Barbuda,ATG,6.48223,8.89929,12.5,17.2286,24.2665,27,30,34,...,67.78,70,73,76.2,79.6,83.2,86.8837,87.074,91.4123,..
7,Argentina,ARG,7.03868,9.78081,10.8821,11.9137,16.0367,17.7206,20.9272,25.9466,...,64.7,68.0431,70.969,74.2949,77.7,79.947,85.5144,87.1507,88.3754,89.229
8,Armenia,ARM,1.30047,1.63109,1.96041,4.57522,4.89901,5.25298,5.63179,6.02125,...,54.6228,59.1008,64.346,64.7449,68.2451,66.5439,76.5077,78.6123,77.0277,..
9,Aruba,ABW,15.4428,17.1,18.8,20.8,23,25.4,28,30.9,...,83.78,88.6612,93.5425,97.17,..,..,..,..,..,..


## ✍️ Judging criteria
| CATEGORY | WEIGHTING | DETAILS                                                              |
|:---------|:----------|:---------------------------------------------------------------------|
| **Visualizations** | 50% | <ul><li>Appropriateness of visualizations used.</li><li>Clarity of insight from visualizations.</li></ul> |
| **Summary** | 35%       | <ul><li>Clarity of insights - how clear and well presented the findings are.</li>
| **Votes** | 15% | <ul><li>Up voting - most upvoted entries get the most points.</li></ul> |

## 🧾 Executive summary
_In a couple of lines, write your main findings here._

## 📷 Visual/Dashboard screenshot
_Paste one screenshot of your visual/dashboard here._

## 🌐 Upload your dashboard (optional)
Ideally, paste your link to your online available dashboard here.

Otherwise, upload your dashboard file to the _Files_ section (_File_ > _Show workbook files_).

## ⌛️ Time is ticking. Good luck!

In [8]:
!pip install pycountry country_converter



In [9]:
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import numpy as np

data.dtypes

Country Name    object
Country Code    object
2000            object
2001            object
2002            object
2003            object
2004            object
2005            object
2006            object
2007            object
2008            object
2009            object
2010            object
2011            object
2012            object
2013            object
2014            object
2015            object
2016            object
2017            object
2018            object
2019            object
2020            object
2021            object
2022            object
2023            object
dtype: object

In [10]:
for col in data:
    data[col] = data[col].replace('..', np.nan)

data.head()

Unnamed: 0,Country Name,Country Code,2000,2001,2002,2003,2004,2005,2006,2007,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
0,Afghanistan,AFG,,0.00472257,0.0045614,0.0878913,0.105809,1.22415,2.10712,1.9,...,7.0,8.26,11.0,13.5,16.8,17.6,18.4,,,
1,Albania,ALB,0.114097,0.325798,0.390081,0.9719,2.42039,6.04389,9.60999,15.0361,...,54.3,56.9,59.6,62.4,65.4,68.5504,72.2377,79.3237,82.6137,83.1356
2,Algeria,DZA,0.491706,0.646114,1.59164,2.19536,4.63448,5.84394,7.37598,9.45119,...,29.5,38.2,42.9455,47.6911,49.0385,58.9776,60.6534,66.2356,71.2432,
3,American Samoa,ASM,,,,,,,,,...,,,,,,,,,,
4,Andorra,AND,10.5388,,11.2605,13.5464,26.838,37.6058,48.9368,70.87,...,86.1,87.9,89.7,91.5675,,90.7187,93.2056,93.8975,94.4855,


In [11]:
col_years = [col for col in data.columns if col.startswith("20")]
data_years = data[col_years]
# for each column change type to float
for year in data_years:
    data_years[year] = data_years[year].astype(float)
    
df_growth = data['Country Name'].copy().to_frame()

for i in range(len(col_years) - 1):
    ano_atual = col_years[i] 
    ano_seguinte = col_years[i + 1]
    df_growth[f'{ano_atual}-{ano_seguinte}'] = ((data_years[ano_seguinte] - data_years[ano_atual]) / data_years[ano_atual]) * 100


df_growth_temp_season = df_growth

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_years[year] = data_years[year].astype(float)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_years[year] = data_years[year].astype(float)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_years[year] = data_years[year].astype(float)
A value is trying to be set on a copy of a slice from a

In [12]:
df_growth = pd.melt(df_growth, id_vars=["Country Name"], var_name="Year", value_name="Growth")

In [21]:
df_growth_all_years = df_growth.groupby("Country Name")['Growth'].median().sort_values(ascending=False).reset_index(name="Growth")
print(df_growth_all_years.head())

nan_dfs = df_growth_all_years["Growth"].isna()

df_growth_all_years["Growth"].fillna(0)

print("Exists ? ", df_growth_all_years["Growth"].isna().any())

fig = px.choropleth(
    df_growth_all_years,
    locations="Country Name",
    locationmode="country names",
    color="Growth",
    hover_name="Country Name",
    projection="orthographic",
    title="Historic Median per Country - Global Internet Patterns",
)

fig.show()

   Country Name     Growth
0       Myanmar  42.806283
1      Ethiopia  40.143024
2      Djibouti  33.520228
3   Timor-Leste  33.333333
4  Turkmenistan  32.252730
Exists ?  True


In [14]:
import country_converter as coco

def get_continent_region(country_name):
    try:
        country_code = coco.convert(names=country_name, to='ISO3')
        continent_name = coco.convert(names=country_code, to='continent')

        return continent_name

    except Exception as e:
        print(f"Erro ao processar {country_name}: {e}")
        return None 

df_growth_temp_season['Region'] = df_growth_temp_season['Country Name'].apply(get_continent_region)
#df_growth['Region'] = df_growth['Country Name'].apply(get_continent_region)
#print(df_growth.head(10))


Channel Islands not found in regex
not found not found in regex


In [15]:
df_growth_temp_season.head(20)

# separar um botão interativo para cada região, podendo escolher diferentes pontos para análisar nos dados do crescimento total.

Unnamed: 0,Country Name,2000-2001,2001-2002,2002-2003,2003-2004,2004-2005,2005-2006,2006-2007,2007-2008,2008-2009,...,2014-2015,2015-2016,2016-2017,2017-2018,2018-2019,2019-2020,2020-2021,2021-2022,2022-2023,Region
0,Afghanistan,,-3.41276,1826.849213,20.386204,1056.943171,72.129233,-9.82953,-3.157895,92.934783,...,18.0,33.171913,22.727273,24.444444,4.761905,4.545455,,,,Asia
1,Albania,185.544756,19.730938,149.153381,149.036938,149.707279,59.00339,56.463222,58.684765,72.673931,...,4.788214,4.745167,4.697987,4.807692,4.817125,5.378962,9.809282,4.147562,0.631735,Europe
2,Algeria,31.402505,146.340429,37.930688,111.103418,26.096995,26.215875,28.134702,7.711304,10.314342,...,29.491525,12.422775,11.050285,2.825265,20.267953,2.841418,9.203441,7.560285,,Africa
3,American Samoa,,,,,,,,,,...,,,,,,,,,,Oceania
4,Andorra,,,20.300164,98.119057,40.12147,30.13099,44.819441,-1.171158,12.121645,...,2.090592,2.047782,2.08194,,,2.741331,0.742337,0.626215,,Europe
5,Angola,29.480418,98.786154,37.0982,25.394543,145.983886,31.191128,13.333333,11.764706,21.052632,...,2.985165,5.454545,12.068966,11.538462,10.791034,14.02236,3.199153,3.932636,,Africa
6,Antigua and Barbuda,37.287477,40.460643,37.8288,40.850098,11.2645,11.111111,13.333333,11.764706,10.526316,...,3.275302,4.285714,4.383562,4.461942,4.522613,4.427524,0.219028,4.982314,,America
7,Argentina,38.958015,11.259701,9.479788,34.607217,10.50029,18.095324,23.985053,8.347915,20.942211,...,5.167079,4.300069,4.686412,4.583222,2.891892,6.963864,1.913479,1.405267,0.96588,America
8,Armenia,25.423116,20.19018,133.380772,7.077037,7.225337,7.211335,6.915386,3.134731,146.376812,...,8.198042,8.875007,0.61993,5.40614,-2.49278,14.973273,2.750834,-2.015715,,Asia
9,Aruba,10.731215,9.94152,10.638298,10.576923,10.434783,10.23622,10.357143,68.28479,11.538462,...,5.826212,5.505565,3.877916,,,,,,,America


In [16]:
!pip install dash

Collecting dash
  Downloading dash-2.18.2-py3-none-any.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting Flask<3.1,>=1.0.4
  Downloading flask-3.0.3-py3-none-any.whl (101 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m101.7/101.7 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting Werkzeug<3.1
  Downloading werkzeug-3.0.6-py3-none-any.whl (227 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m228.0/228.0 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
Collecting dash-html-components==2.0.0
  Downloading dash_html_components-2.0.0-py3-none-any.whl (4.1 kB)
Collecting dash-core-components==2.0.0
  Downloading dash_core_components-2.0.0-py3-none-any.whl (3.8 kB)
Collecting dash-table==5.0.0
  Downloading dash_table-5.0.0-py3-none-any.whl (3.9 kB)
Collecting importlib-metadata
  Downloading importlib_metadata-8.6.1-py3-none-any.wh

In [17]:
from dash import Dash, html

data_temp = data.drop(columns=['Country Code'])
df_series = data_temp.melt(id_vars='Country Name', var_name='Ano', value_name='Crescimento')

df_series['Crescimento'] = pd.to_numeric(df_series['Crescimento'],  errors='coerce')
df_series['Crescimento'] = df_series['Crescimento'].round(2)

print(df_series.head())

fig = px.bar(df_series,
            x="Country Name",
            y="Crescimento",
            animation_frame="Ano",
            animation_group="Country Name",
            hover_name="Country Name", 
            title="Time Series to Global Internet patterns",
            labels={"Crescimento": "Crescimento", "Ano": "Ano", "paises": "Paises"},
            template="plotly_dark"
)

fig.update_layout(
    showlegend=True,
    xaxis_title="Paises",
    yaxis_title="Crescimento",
    margin=dict(l=20, r=20, t=60, b=100),
    updatemenus=[
        dict(
            type="buttons",
            buttons=[
                dict(label="Play",
                     method="animate",
                     args=[None, {"frame": {"duration": 700, "redraw": True}, "fromcurrent": True}]),
                dict(label="Pause",
                     method="animate",
                     args=[[None], {"frame": {"duration": 0, "redraw": False}, "mode": "immediate", "transition": {"duration": 0}}])
            ],
            x=0.1,
            y=-0.35,
            font=dict(size=10),
            pad={"r": 30, "t": 10},
        )
    ],
        sliders=[dict(
        active=0,
        yanchor="top",
        xanchor="left",
        currentvalue={
            "font": {"size": 12},
            "prefix": "Ano: ",
            "visible": True,
            "xanchor": "right"
        },
        y=-0.30,
        x=0.1,   
        pad={"b": 10, "t": 10}, 
        len=0.9, 
    )],

    width=1200, 
    height=800,  
)
fig.update_yaxes(range=[0, df_series['Crescimento'].max()])
fig.update_xaxes(tickangle=-45)
fig.show()


     Country Name   Ano  Crescimento
0     Afghanistan  2000          NaN
1         Albania  2000         0.11
2         Algeria  2000         0.49
3  American Samoa  2000          NaN
4         Andorra  2000        10.54


In [18]:
def plot_country(paises_selecionados=None):
    if paises_selecionados == None: 
        return print("No have country selected")
    
    df_growth_filtrado = df_growth[df_growth['Country Name'].isin(paises_selecionados)]
    df_growth_long = pd.melt(df_growth_filtrado, id_vars=['Country Name'], var_name='Year', value_name='Growth')

    plt.figure(figsize=(12, 6))
    sns.lineplot(x='Year', y='Growth', hue='Country Name', data=df_growth_long)
    plt.xticks(rotation=45)
    plt.title('Crescimento Anual Usabilidade de Internet')
    plt.xlabel('Ano')
    plt.ylabel('Crescimento (%)')
    plt.tight_layout()
    plt.show()

plot_country(["Albania"])

ValueError: value_name (Growth) cannot match an element in the DataFrame columns.

In [55]:
from dash import Dash, dcc, html, Input, Output

app = Dash(__name__)

app.layout = html.Div([
    html.H4("Growth  Year to Year Country Global Internet Patterns"),
    dcc.Graph(id='graph'),
    dcc.Checklist(
        id="checklist",
        options=["Asia", "America", "Africa","Europe", "Oceania"],
        value=["America"],
        inline=True
    )
])


@app.callback(
    Output("graph", "figure"),
    Input("checklist", "value"))


def update_line_chart(continents):
    df_growth_temp_season = px.data.gapminder()
    mask = df_growth_temp_season.continent.isin(continents)
    
    fig = px.line(df_growth_temp_season[mask], x="")