<a href="https://colab.research.google.com/github/tramyynt/COVID-19-impact-on-Digital-Learning-/blob/main/Covid_19_Impact_on_Digital_Learning_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# **Covid Impact on Digital Learning**:
This Kaggle competition challenge is to uncover some trends in digital learning.

### **Problem Statement**
The COVID-19 Pandemic has disrupted learning for more than 56 million students in the United States. In the Spring of 2020, most states and local governments across the U.S. closed educational institutions to stop the spread of the virus. In response, schools and teachers have attempted to reach students remotely through distance learning tools and digital platforms. Until today, concerns of the exacaberting digital divide and long-term learning loss among America’s most vulnerable learners continue to grow.

### **What should focus on ?**


1.   What is the picture of digital connectivity and engagement in 2020?
2.   What is the effect of the COVID-19 pandemic on online and distance learning, and how might this also evolve in the future?
3. How does student engagement with different types of education technology change over the course of the pandemic?
4. How does student engagement with online learning platforms relate to different geography? Demographic context (e.g., race/ethnicity, ESL, learning disability)? Learning context? Socioeconomic status?
5. Do certain state interventions, practices or policies (e.g., stimulus, reopening, eviction moratorium) correlate with the increase or decrease online engagement?






## Data Preparation


1.   Import neccessary libs
2.   Get data from Kaggle API 



In [60]:
import numpy as np 
import pandas as pd

import os
import glob
import matplotlib.pyplot as plt
import seaborn as sns
import re

In [2]:
!pip install -U -q kaggle==1.5.8

[?25l[K     |█████▌                          | 10 kB 18.5 MB/s eta 0:00:01[K     |███████████                     | 20 kB 21.8 MB/s eta 0:00:01[K     |████████████████▋               | 30 kB 25.6 MB/s eta 0:00:01[K     |██████████████████████▏         | 40 kB 27.8 MB/s eta 0:00:01[K     |███████████████████████████▊    | 51 kB 21.5 MB/s eta 0:00:01[K     |████████████████████████████████| 59 kB 4.8 MB/s 
[?25h  Building wheel for kaggle (setup.py) ... [?25l[?25hdone
  Building wheel for slugify (setup.py) ... [?25l[?25hdone


In [9]:
!ls -la -r

total 20
drwxr-xr-x 1 root root 4096 Aug 25 13:35 sample_data
-rw-r--r-- 1 root root   66 Aug 26 18:04 kaggle.json
drwxr-xr-x 4 root root 4096 Aug 25 13:35 .config
drwxr-xr-x 1 root root 4096 Aug 26 17:49 ..
drwxr-xr-x 1 root root 4096 Aug 26 18:04 .


In [11]:
# MUST UPLOAD kaggle.json FIRST
# move kaggle.json to /root/.kaggle
!mv /kaggle.json /root/.kaggle
!chmod 600 /root/.kaggle/kaggle.json

In [12]:
!kaggle competitions download -c "learnplatform-covid19-impact-on-digital-learning" 

Downloading learnplatform-covid19-impact-on-digital-learning.zip to /content
100% 124M/124M [00:02<00:00, 52.4MB/s]
100% 124M/124M [00:02<00:00, 54.1MB/s]


In [13]:
# unzip data
!unzip '*.zip'

Archive:  learnplatform-covid19-impact-on-digital-learning.zip
  inflating: README.md               
  inflating: districts_info.csv      
  inflating: engagement_data/1000.csv  
  inflating: engagement_data/1039.csv  
  inflating: engagement_data/1044.csv  
  inflating: engagement_data/1052.csv  
  inflating: engagement_data/1131.csv  
  inflating: engagement_data/1142.csv  
  inflating: engagement_data/1179.csv  
  inflating: engagement_data/1204.csv  
  inflating: engagement_data/1270.csv  
  inflating: engagement_data/1324.csv  
  inflating: engagement_data/1444.csv  
  inflating: engagement_data/1450.csv  
  inflating: engagement_data/1470.csv  
  inflating: engagement_data/1536.csv  
  inflating: engagement_data/1549.csv  
  inflating: engagement_data/1558.csv  
  inflating: engagement_data/1570.csv  
  inflating: engagement_data/1584.csv  
  inflating: engagement_data/1624.csv  
  inflating: engagement_data/1705.csv  
  inflating: engagement_data/1712.csv  
  inflating: engageme

In [14]:
# get file name and assign to a new column district_id
files = glob.glob("engagement_data/*.csv")
list_of_df = [pd.read_csv(file).assign(district_id=os.path.basename(file).strip(".csv")) for file in files] 
#list_of_df = [pd.read_csv(file) for file in files]

In [15]:
engagement = pd.concat(list_of_df, ignore_index= True)

In [16]:
engagement.shape

(22324190, 5)

In [62]:
engagement.head(5)

Unnamed: 0,time,lp_id,pct_access,engagement_index,district_id
0,2020-01-01,53627.0,0.0,,3322
1,2020-01-01,13591.0,0.0,0.04,3322
2,2020-01-01,49050.0,0.0,0.12,3322
3,2020-01-01,98265.0,0.02,0.76,3322
4,2020-01-01,49785.0,0.01,0.16,3322


In [51]:
districts = pd.read_csv("districts_info.csv")
products = pd.read_csv("products_info.csv")

In [52]:
print(districts.shape)
print(products.shape)

(233, 7)
(372, 6)


In [53]:
districts.head(5)

Unnamed: 0,district_id,state,locale,pct_black/hispanic,pct_free/reduced,county_connections_ratio,pp_total_raw
0,8815,Illinois,Suburb,"[0, 0.2[","[0, 0.2[","[0.18, 1[","[14000, 16000["
1,2685,,,,,,
2,4921,Utah,Suburb,"[0, 0.2[","[0.2, 0.4[","[0.18, 1[","[6000, 8000["
3,3188,,,,,,
4,2238,,,,,,


In [27]:
products.head(5)

Unnamed: 0,LP ID,URL,Product Name,Provider/Company Name,Sector(s),Primary Essential Function
0,13117,https://www.splashmath.com,SplashLearn,StudyPad Inc.,PreK-12,LC - Digital Learning Platforms
1,66933,https://abcmouse.com,ABCmouse.com,"Age of Learning, Inc",PreK-12,LC - Digital Learning Platforms
2,50479,https://www.abcya.com,ABCya!,"ABCya.com, LLC",PreK-12,"LC - Sites, Resources & Reference - Games & Si..."
3,92993,http://www.aleks.com/,ALEKS,McGraw-Hill PreK-12,PreK-12; Higher Ed,LC - Digital Learning Platforms
4,73104,https://www.achieve3000.com/,Achieve3000,Achieve3000,PreK-12,LC - Digital Learning Platforms


## Data cleaning and Preprocessing


1.   Work with missing values
2.   Get dummies sectors 




In [None]:
print(districts.info())
print(engagement.info())
print(products.info())

In [54]:
#work with missing values in districts
districts.isnull().sum()/len(districts)
districts[districts.state.notna()].reset_index(drop=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 233 entries, 0 to 232
Data columns (total 7 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   district_id               233 non-null    int64 
 1   state                     176 non-null    object
 2   locale                    176 non-null    object
 3   pct_black/hispanic        176 non-null    object
 4   pct_free/reduced          148 non-null    object
 5   county_connections_ratio  162 non-null    object
 6   pp_total_raw              118 non-null    object
dtypes: int64(1), object(6)
memory usage: 12.9+ KB


Unnamed: 0,district_id,state,locale,pct_black/hispanic,pct_free/reduced,county_connections_ratio,pp_total_raw
0,8815,Illinois,Suburb,"[0, 0.2[","[0, 0.2[","[0.18, 1[","[14000, 16000["
1,4921,Utah,Suburb,"[0, 0.2[","[0.2, 0.4[","[0.18, 1[","[6000, 8000["
2,5987,Wisconsin,Suburb,"[0, 0.2[","[0, 0.2[","[0.18, 1[","[10000, 12000["
3,3710,Utah,Suburb,"[0, 0.2[","[0.4, 0.6[","[0.18, 1[","[6000, 8000["
4,7177,North Carolina,Suburb,"[0.2, 0.4[","[0.2, 0.4[","[0.18, 1[","[8000, 10000["
...,...,...,...,...,...,...,...
171,9515,New York,Rural,"[0, 0.2[","[0.4, 0.6[","[0.18, 1[","[18000, 20000["
172,8103,Tennessee,Rural,"[0.2, 0.4[",,"[0.18, 1[","[8000, 10000["
173,4929,Virginia,Rural,"[0, 0.2[","[0.4, 0.6[","[0.18, 1[","[12000, 14000["
174,7975,California,City,"[0.6, 0.8[","[0.6, 0.8[","[0.18, 1[",


In [27]:
# convert datetime type, district_id to int64 (same as districts)
engagement['time'] = pd.to_datetime(engagement['time'])
engagement['district_id']= engagement['district_id'].astype(str).astype(int)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22324190 entries, 0 to 22324189
Data columns (total 5 columns):
 #   Column            Dtype         
---  ------            -----         
 0   time              datetime64[ns]
 1   lp_id             float64       
 2   pct_access        float64       
 3   engagement_index  float64       
 4   district_id       object        
dtypes: datetime64[ns](1), float64(3), object(1)
memory usage: 851.6+ MB


time                0.000000
lp_id               0.000024
pct_access          0.000602
engagement_index    0.240923
district_id         0.000000
dtype: float64

In [None]:
# get dummies sector for products.
sectors = products['Sector(s)'].str.get_dummies(sep="; ")
sectors.columns = [f"sector_{re.sub(' ', '', c)}" for c in sectors.columns]
products = products.join(sectors)
products.drop("Sector(s)", axis=1, inplace=True)

del sectors

In [63]:
products

Unnamed: 0,LP ID,URL,Product Name,Provider/Company Name,Primary Essential Function,sector_Corporate,sector_HigherEd,sector_PreK-12
0,13117,https://www.splashmath.com,SplashLearn,StudyPad Inc.,LC - Digital Learning Platforms,0,0,1
1,66933,https://abcmouse.com,ABCmouse.com,"Age of Learning, Inc",LC - Digital Learning Platforms,0,0,1
2,50479,https://www.abcya.com,ABCya!,"ABCya.com, LLC","LC - Sites, Resources & Reference - Games & Si...",0,0,1
3,92993,http://www.aleks.com/,ALEKS,McGraw-Hill PreK-12,LC - Digital Learning Platforms,0,1,1
4,73104,https://www.achieve3000.com/,Achieve3000,Achieve3000,LC - Digital Learning Platforms,0,0,1
...,...,...,...,...,...,...,...,...
367,88065,https://dochub.com/,DocHub,DocHub,SDO - Other,1,1,1
368,37805,http://google.com/slides/about/,Google Slides,Google LLC,LC - Content Creation & Curation,1,1,1
369,32555,http://www.innersloth.com/gameAmongUs.php,Among Us,InnerSloth,"LC - Sites, Resources & Reference - Games & Si...",0,1,1
370,87841,http://edpuzzle.com,Edpuzzle - Free (Basic Plan),EDpuzzle Inc.,,0,0,0


## **Exploratory Data Analysis**

### 1. Districts


In [64]:
districts.tail(5)

Unnamed: 0,district_id,state,locale,pct_black/hispanic,pct_free/reduced,county_connections_ratio,pp_total_raw
228,9515,New York,Rural,"[0, 0.2[","[0.4, 0.6[","[0.18, 1[","[18000, 20000["
229,8103,Tennessee,Rural,"[0.2, 0.4[",,"[0.18, 1[","[8000, 10000["
230,4929,Virginia,Rural,"[0, 0.2[","[0.4, 0.6[","[0.18, 1[","[12000, 14000["
231,7975,California,City,"[0.6, 0.8[","[0.6, 0.8[","[0.18, 1[",
232,7164,California,City,"[0.6, 0.8[","[0.6, 0.8[","[0.18, 1[",
