# Explore PISA-2012 Results
## by Viktor Begun

## Preliminary Wrangling

> [According to wikipedia:](https://en.wikipedia.org/wiki/Programme_for_International_Student_Assessment) "The Programme for International Student Assessment (PISA) is a worldwide study by the Organisation for Economic Co-operation and Development (OECD) in member and non-member nations intended to evaluate educational systems by measuring 15-year-old school pupils' scholastic performance on mathematics, science, and reading."

> In this project the data from 2012 provided by Udacity are analyzed. 

In [1]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

%matplotlib inline

> Load in your dataset and describe its properties through the questions below.
Try and motivate your exploration goals through this section.

In [2]:
# Get the encoding
with open('pisa2012.csv.zip') as f:
    print(f)

<_io.TextIOWrapper name='pisa2012.csv.zip' mode='r' encoding='cp1252'>


In [3]:
# Read the data to the dataframe
# Enable a check of the datatype for columns with mixed types
pisa_2012_df = pd.read_csv('pisa2012.csv.zip',encoding='cp1252',low_memory=False)

In [5]:
# Get a general info
pisa_2012_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 485490 entries, 0 to 485489
Columns: 636 entries, Unnamed: 0 to VER_STU
dtypes: float64(250), int64(18), object(368)
memory usage: 2.3+ GB


**Too many columns for a manual check!**

In [10]:
# Get the head of the table
pisa_2012_df.head()

Unnamed: 0.1,Unnamed: 0,CNT,SUBNATIO,STRATUM,OECD,NC,SCHOOLID,STIDSTD,ST01Q01,ST02Q01,...,W_FSTR75,W_FSTR76,W_FSTR77,W_FSTR78,W_FSTR79,W_FSTR80,WVARSTRR,VAR_UNIT,SENWGT_STU,VER_STU
0,1,Albania,80000,ALB0006,Non-OECD,Albania,1,1,10,1.0,...,13.7954,13.9235,13.1249,13.1249,4.3389,13.0829,19,1,0.2098,22NOV13
1,2,Albania,80000,ALB0006,Non-OECD,Albania,1,2,10,1.0,...,13.7954,13.9235,13.1249,13.1249,4.3389,13.0829,19,1,0.2098,22NOV13
2,3,Albania,80000,ALB0006,Non-OECD,Albania,1,3,9,1.0,...,12.7307,12.7307,12.7307,12.7307,4.2436,12.7307,19,1,0.1999,22NOV13
3,4,Albania,80000,ALB0006,Non-OECD,Albania,1,4,9,1.0,...,12.7307,12.7307,12.7307,12.7307,4.2436,12.7307,19,1,0.1999,22NOV13
4,5,Albania,80000,ALB0006,Non-OECD,Albania,1,5,9,1.0,...,12.7307,12.7307,12.7307,12.7307,4.2436,12.7307,19,1,0.1999,22NOV13


**Why many entries for a country?**

In [12]:
# Check the number of entries for each country
pisa_2012_df.CNT.value_counts().sort_index()

Albania                      4743
Argentina                    5908
Australia                   14481
Austria                      4755
Belgium                      8597
Brazil                      19204
Bulgaria                     5282
Canada                      21544
Chile                        6856
China-Shanghai               5177
Chinese Taipei               6046
Colombia                     9073
Connecticut (USA)            1697
Costa Rica                   4602
Croatia                      5008
Czech Republic               5327
Denmark                      7481
Estonia                      4779
Finland                      8829
Florida (USA)                1896
France                       4613
Germany                      5001
Greece                       5125
Hong Kong-China              4670
Hungary                      4810
Iceland                      3508
Indonesia                    5622
Ireland                      5016
Israel                       5055
Italy         

**! Each entry is an individual test of a pupil !**

In [15]:
# Get the encoding the 'dictionary' to the `pisa2012.csv` provided by Udacity
with open('pisadict2012.csv') as f:
    print(f)

<_io.TextIOWrapper name='pisadict2012.csv' mode='r' encoding='cp1252'>


In [16]:
# Read 'pisadict2012.csv' and get the header
pisa_2012_dict_df = pd.read_csv('pisadict2012.csv',encoding='cp1252')
pisa_2012_dict_df.head()

Unnamed: 0.1,Unnamed: 0,x
0,CNT,Country code 3-character
1,SUBNATIO,Adjudicated sub-region code 7-digit code (3-di...
2,STRATUM,Stratum ID 7-character (cnt + region ID + orig...
3,OECD,OECD country
4,NC,National Centre 6-digit Code


In [None]:
pisa_2012_dict_df.

### What is the structure of your dataset?

> Your answer here!

### What is/are the main feature(s) of interest in your dataset?

> Your answer here!

### What features in the dataset do you think will help support your investigation into your feature(s) of interest?

> Your answer here!

## Univariate Exploration

> In this section, investigate distributions of individual variables. If
you see unusual points or outliers, take a deeper look to clean things up
and prepare yourself to look at relationships between variables.

> Make sure that, after every plot or related series of plots, that you
include a Markdown cell with comments about what you observed, and what
you plan on investigating next.

### Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

> Your answer here!

### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

> Your answer here!

## Bivariate Exploration

> In this section, investigate relationships between pairs of variables in your
data. Make sure the variables that you cover here have been introduced in some
fashion in the previous section (univariate exploration).

### Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

> Your answer here!

### Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

> Your answer here!

## Multivariate Exploration

> Create plots of three or more variables to investigate your data even
further. Make sure that your investigations are justified, and follow from
your work in the previous sections.

### Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

> Your answer here!

### Were there any interesting or surprising interactions between features?

> Your answer here!

> At the end of your report, make sure that you export the notebook as an
html file from the `File > Download as... > HTML` menu. Make sure you keep
track of where the exported file goes, so you can put it in the same folder
as this notebook for project submission. Also, make sure you remove all of
the quote-formatted guide notes like this one before you finish your report!