<a href="https://colab.research.google.com/github/tramyynt/COVID-19-impact-on-Digital-Learning-/blob/main/Covid_19_Impact_on_Digital_Learning_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# **Covid Impact on Digital Learning**:
This notebook is to uncover some trends in digital learning.

### **Problem Statement**
The COVID-19 Pandemic has disrupted learning for more than 56 million students in the United States. In the Spring of 2020, most states and local governments across the U.S. closed educational institutions to stop the spread of the virus. In response, schools and teachers have attempted to reach students remotely through distance learning tools and digital platforms. Until today, concerns of the exacaberting digital divide and long-term learning loss among America’s most vulnerable learners continue to grow.

### **What should focus on ?**


1.   What is the picture of digital connectivity and engagement in 2020?
2.   What is the effect of the COVID-19 pandemic on online and distance learning, and how might this also evolve in the future?
3. How does student engagement with different types of education technology change over the course of the pandemic?
4. How does student engagement with online learning platforms relate to different geography? Demographic context (e.g., race/ethnicity, ESL, learning disability)? Learning context? Socioeconomic status?
5. Do certain state interventions, practices or policies (e.g., stimulus, reopening, eviction moratorium) correlate with the increase or decrease online engagement?






## Data Preparation


1.   Import neccessary libs
2.   Get data from Kaggle API 



In [129]:
import numpy as np 
import pandas as pd

import os
import glob
import matplotlib.pyplot as plt
import seaborn as sns
import re

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [130]:
!pip install -U -q kaggle==1.5.8

In [131]:
!ls -la -r

total 127216
drwxr-xr-x 1 root root      4096 Aug 25 13:35 sample_data
-rw-r--r-- 1 root root      6117 Jul 27 17:36 README.md
-rw-r--r-- 1 root root     44683 Jul 27 17:37 products_info.csv
-rw-r--r-- 1 root root 130173024 Aug 26 18:06 learnplatform-covid19-impact-on-digital-learning.zip
drwxr-xr-x 2 root root      4096 Aug 26 18:06 .ipynb_checkpoints
drwxr-xr-x 2 root root      4096 Aug 26 18:06 engagement_data
-rw-r--r-- 1 root root     13560 Jul 27 17:36 districts_info.csv
drwxr-xr-x 4 root root      4096 Aug 25 13:35 .config
drwxr-xr-x 1 root root      4096 Aug 26 18:06 ..
drwxr-xr-x 1 root root      4096 Aug 26 18:06 .


In [132]:
# MUST UPLOAD kaggle.json FIRST
# move kaggle.json to /root/.kaggle
!mv /kaggle.json /root/.kaggle
!chmod 600 /root/.kaggle/kaggle.json

mv: cannot stat '/kaggle.json': No such file or directory


In [133]:
!kaggle competitions download -c "learnplatform-covid19-impact-on-digital-learning" 

learnplatform-covid19-impact-on-digital-learning.zip: Skipping, found more recently modified local copy (use --force to force download)


In [134]:
# unzip data
!unzip '*.zip'

Archive:  learnplatform-covid19-impact-on-digital-learning.zip
replace README.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

In [136]:
# get file name and assign to a new column district_id
files = glob.glob("engagement_data/*.csv")
list_of_df = [pd.read_csv(file).assign(district_id=os.path.basename(file).strip(".csv")) for file in files] 
#list_of_df = [pd.read_csv(file) for file in files]

In [137]:
engagement = pd.concat(list_of_df, ignore_index= True)

In [138]:
engagement.shape

(22324190, 5)

In [139]:
engagement.head(5)

Unnamed: 0,time,lp_id,pct_access,engagement_index,district_id
0,2020-01-01,76649.0,0.27,8.03,4666
1,2020-01-02,76649.0,0.13,5.35,4666
2,2020-01-03,92844.0,0.13,8.03,4666
3,2020-01-03,76649.0,0.13,5.35,4666
4,2020-01-04,76649.0,0.4,17.4,4666


In [140]:
districts = pd.read_csv("districts_info.csv")
products = pd.read_csv("products_info.csv")

In [141]:
print(districts.shape)
print(products.shape)

(233, 7)
(372, 6)


In [142]:
districts.head(5)

Unnamed: 0,district_id,state,locale,pct_black/hispanic,pct_free/reduced,county_connections_ratio,pp_total_raw
0,8815,Illinois,Suburb,"[0, 0.2[","[0, 0.2[","[0.18, 1[","[14000, 16000["
1,2685,,,,,,
2,4921,Utah,Suburb,"[0, 0.2[","[0.2, 0.4[","[0.18, 1[","[6000, 8000["
3,3188,,,,,,
4,2238,,,,,,


In [143]:
products.head(5)

Unnamed: 0,LP ID,URL,Product Name,Provider/Company Name,Sector(s),Primary Essential Function
0,13117,https://www.splashmath.com,SplashLearn,StudyPad Inc.,PreK-12,LC - Digital Learning Platforms
1,66933,https://abcmouse.com,ABCmouse.com,"Age of Learning, Inc",PreK-12,LC - Digital Learning Platforms
2,50479,https://www.abcya.com,ABCya!,"ABCya.com, LLC",PreK-12,"LC - Sites, Resources & Reference - Games & Si..."
3,92993,http://www.aleks.com/,ALEKS,McGraw-Hill PreK-12,PreK-12; Higher Ed,LC - Digital Learning Platforms
4,73104,https://www.achieve3000.com/,Achieve3000,Achieve3000,PreK-12,LC - Digital Learning Platforms


## Data cleaning and Preprocessing


1.   Work with missing values
2.   Get dummies sectors 




In [144]:
print(districts.info())
print(engagement.info())
print(products.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 233 entries, 0 to 232
Data columns (total 7 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   district_id               233 non-null    int64 
 1   state                     176 non-null    object
 2   locale                    176 non-null    object
 3   pct_black/hispanic        176 non-null    object
 4   pct_free/reduced          148 non-null    object
 5   county_connections_ratio  162 non-null    object
 6   pp_total_raw              118 non-null    object
dtypes: int64(1), object(6)
memory usage: 12.9+ KB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22324190 entries, 0 to 22324189
Data columns (total 5 columns):
 #   Column            Dtype  
---  ------            -----  
 0   time              object 
 1   lp_id             float64
 2   pct_access        float64
 3   engagement_index  float64
 4   district_id       object 
dtypes: float64(3), obj

In [145]:
#work with missing values in districts
districts.isnull().sum()/len(districts)
districts = districts[districts.state.notna()].reset_index(drop=True)

In [146]:
# convert datetime type, district_id to int64 (same as districts)
engagement['time'] = pd.to_datetime(engagement['time'])
engagement['district_id']= engagement['district_id'].astype(str).astype(int)

In [147]:
# get dummies sector for products.
sectors = products['Sector(s)'].str.get_dummies(sep="; ")
sectors.columns = [f"sector_{re.sub(' ', '', c)}" for c in sectors.columns]
products = products.join(sectors)
products.drop("Sector(s)", axis=1, inplace=True)

del sectors

## **Exploratory Data Analysis**

### 1. Districts


In [148]:
districts.tail(5)

Unnamed: 0,district_id,state,locale,pct_black/hispanic,pct_free/reduced,county_connections_ratio,pp_total_raw
171,9515,New York,Rural,"[0, 0.2[","[0.4, 0.6[","[0.18, 1[","[18000, 20000["
172,8103,Tennessee,Rural,"[0.2, 0.4[",,"[0.18, 1[","[8000, 10000["
173,4929,Virginia,Rural,"[0, 0.2[","[0.4, 0.6[","[0.18, 1[","[12000, 14000["
174,7975,California,City,"[0.6, 0.8[","[0.6, 0.8[","[0.18, 1[",
175,7164,California,City,"[0.6, 0.8[","[0.6, 0.8[","[0.18, 1[",


**Number of school districts per state**

In [149]:
districts_by_state = districts['state'].value_counts().to_frame().reset_index(drop=False)
districts_by_state.columns = ['state','count_districts']
districts_by_state

Unnamed: 0,state,count_districts
0,Connecticut,30
1,Utah,29
2,Massachusetts,21
3,Illinois,18
4,California,12
5,Ohio,11
6,New York,8
7,Indiana,7
8,Missouri,6
9,Washington,6


In [150]:
#using built-in United States Choropleth Map
us_state = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'American Samoa': 'AS',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'District Of Columbia': 'DC',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Guam': 'GU',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Northern Mariana Islands':'MP',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Puerto Rico': 'PR',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virgin Islands': 'VI',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY'
}
districts_by_state['state_abbrev'] = districts_by_state['state'].replace(us_state) #https://stackoverflow.com/questions/40075106/replace-values-in-pandas-series-with-dictionary
fig = go.Figure()
layout = dict(
    title_text = "Number of School Districts per State",
    geo_scope='usa',
)

fig.add_trace(
    go.Choropleth(
        locations=districts_by_state.state_abbrev,
        zmax=1,
        z = districts_by_state.count_districts,
        locationmode = 'USA-states',
        marker_line_color='white',
        geo='geo',
        colorscale = 'Reds', 
    )
)
            
fig.update_layout(layout)   
fig.show()
