# POLI 175 - Machine Learning for Social Sciences

## Python Refresh I

---

# Data Science with Pandas

## Load Pandas

Load pandas is very easy. Provided that the package is installed (if not, check [here](https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html) how to install it), type:

In [2]:
# My code here
import pandas as pd

And now, pandas is loaded.

## Load Data into Python

To start having fun, we need to load data into Python. We can do this in three ways: from a local file, from the internet, and from data typed in the keyboard.

### From Locale

First, we need to find the working directory. To do that, we need to use the library `os`. To do this you need to:

```
import os
print(os.getcwd())
```

Then, you need to put the file in the folder. If you need to change the folder, use the function:

```
os.chdir("new_path_here")
```

Now that we know the folder, and the file is there, we can load it:

```
dat = pd.read_csv('file_name_here.csv')
```

Here I will load CSV files, but Pandas has the ability to load files from other formats, such as Excel, SPSS, R, and others.

In [3]:
# My code here
import os

In [4]:
print(os.getcwd())

/Users/umbertomignozzetti/PythonQTM385Class


### From the internet

The way we will load here is from the internet. 

For example, suppose the following dataset: https://raw.githubusercontent.com/umbertomig/qtm150/master/datasets/PErisk.csv.

To open, we use the `read_csv` command as we did with the locale version.

In [5]:
# My code here
dat = pd.read_csv('https://raw.githubusercontent.com/umbertomig/qtm150/master/datasets/PErisk.csv')

In [6]:
dat.head()

Unnamed: 0,country,courts,barb2,prsexp2,prscorr2,gdpw2
0,Argentina,0,-0.720775,1,3,9.69017
1,Australia,1,-6.907755,5,4,10.30484
2,Austria,1,-4.910337,5,4,10.10094
3,Bangladesh,0,0.775975,1,0,8.379768
4,Belgium,1,-4.617344,5,4,10.25012


### From typing in the keyboard

We can also build a dataset from scratch.

For example, we could build a simple dataset in the following way:

```
dat = pd.DataFrame({
    "v1": ['d1', 'd2', 'd3'],
    "v2": [1, 2, 3],
    "v3": ['A', 'B', 'A'],
    "v4": [2.0, 1.1, 2.2]})
```

And this works for small datasets, with the inconvenience of having to type.

## Dataset Information

Suppose we have a pandas dataset called `dat`. To make it more realistic, use the following example:

```
# For me: PErisk
dat = pd.read_csv('https://raw.githubusercontent.com/umbertomig/qtm150/master/datasets/PErisk.csv')

# For you: tips
dat2 = pd.read_csv('https://raw.githubusercontent.com/umbertomig/qtm151/main/datasets/tips.csv')
```

If you are having VPN issues, I put the data on Canvas, so you can download it and load from your own computer.

In [8]:
# My code here
dat = pd.read_csv('https://raw.githubusercontent.com/umbertomig/qtm150/master/datasets/PErisk.csv')
tips = pd.read_csv('https://raw.githubusercontent.com/umbertomig/qtm151/main/datasets/tips.csv')
print(dat.head())
print(tips.head())

      country  courts     barb2  prsexp2  prscorr2      gdpw2
0   Argentina       0 -0.720775        1         3   9.690170
1   Australia       1 -6.907755        5         4  10.304840
2     Austria       1 -4.910337        5         4  10.100940
3  Bangladesh       0  0.775975        1         0   8.379768
4     Belgium       1 -4.617344        5         4  10.250120
   obs  totbill   tip sex smoker  day   time  size
0    1    16.99  1.01   F     No  Sun  Night     2
1    2    10.34  1.66   M     No  Sun  Night     3
2    3    21.01  3.50   M     No  Sun  Night     3
3    4    23.68  3.31   M     No  Sun  Night     2
4    5    24.59  3.61   F     No  Sun  Night     4


### .info(.)

This method prints the information about the content of a dataset.

Syntax and Usage: `print(dat.info())`

In [9]:
# My code here
dat.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62 entries, 0 to 61
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   country   62 non-null     object 
 1   courts    62 non-null     int64  
 2   barb2     62 non-null     float64
 3   prsexp2   62 non-null     int64  
 4   prscorr2  62 non-null     int64  
 5   gdpw2     62 non-null     float64
dtypes: float64(2), int64(3), object(1)
memory usage: 3.0+ KB


### .head(.)

This method prints the first few observations of the dataset.

Syntax and Usage: `print(dat.head())`

In [10]:
# My code here
dat.head()

Unnamed: 0,country,courts,barb2,prsexp2,prscorr2,gdpw2
0,Argentina,0,-0.720775,1,3,9.69017
1,Australia,1,-6.907755,5,4,10.30484
2,Austria,1,-4.910337,5,4,10.10094
3,Bangladesh,0,0.775975,1,0,8.379768
4,Belgium,1,-4.617344,5,4,10.25012


### .shape

This prints the number of rows and columns of a dataset.

Syntax and Usage: `print(dat.shape)`

Note: no parenthesis necessary.

In [11]:
# My code here
print(dat.shape)

(62, 6)


### .describe(.)

This method gives us a few summary statistics of the dataset.

Syntax and Usage: `print(dat.describe())`

In [12]:
# My code here
dat.describe()

Unnamed: 0,courts,barb2,prsexp2,prscorr2,gdpw2
count,62.0,62.0,62.0,62.0,62.0
mean,0.451613,-2.925557,3.274194,2.532258,9.041875
std,0.501716,2.707211,1.369089,1.501013,0.970264
min,0.0,-6.907755,0.0,0.0,7.029973
25%,0.0,-4.894882,3.0,1.25,8.381027
50%,0.0,-2.353233,3.0,2.0,9.185412
75%,1.0,-1.301007,4.0,4.0,9.88928
max,1.0,2.337425,5.0,5.0,10.41018


### .values

This prints the observations in the dataset.

Syntax and Usage: `print(dat.values)`

Note: no parenthesis necessary.

In [13]:
# My code here
print(dat.values)

[['Argentina' 0 -0.7207754 1 3 9.69017]
 ['Australia' 1 -6.907755 5 4 10.304839999999999]
 ['Austria' 1 -4.910337 5 4 10.10094]
 ['Bangladesh' 0 0.7759748000000001 1 0 8.379767999999999]
 ['Belgium' 1 -4.617344 5 4 10.250119999999999]
 ['Bolivia' 0 -2.46144 0 0 8.583542999999999]
 ['Botswana' 1 -1.244868 4 3 8.77771]
 ['Brazil' 1 -0.45703370000000004 4 3 9.375601]
 ['Burma' 0 1.604343 3 1 7.0967210000000005]
 ['Cameroon' 0 -4.229065 3 1 8.120886]
 ['Canada' 1 -6.907755 5 5 10.41018]
 ['Chile' 1 -1.542761 3 2 9.261224]
 ['Colombia' 0 -2.057821 3 2 9.191972999999999]
 ['Congo-Kinshasa' 0 -2.3232880000000002 1 0 7.095064]
 ['Costa Rica' 1 -5.090003 3 4 9.167328999999999]
 ["Cote d'Ivoire" 1 -4.229065 4 2 8.228711]
 ['Denmark' 1 -6.907755 5 5 10.10651]
 ['Dominican Republic' 0 -2.378862 2 2 8.899731]
 ['Ecuador' 1 -1.845337 3 2 9.117786]
 ['Finland' 1 -6.907755 5 5 10.123669999999999]
 ['Gambia, The' 0 -1.543332 4 2 7.501082000000001]
 ['Ghana' 0 -1.011517 2 1 7.597396000000001]
 ['Greece'

### .columns

This prints the variables information of the dataset.

Syntax and Usage: `print(dat.columns)`

Note: no parenthesis necessary.

In [14]:
# My code here
print(dat.columns)

Index(['country', 'courts', 'barb2', 'prsexp2', 'prscorr2', 'gdpw2'], dtype='object')


### .index

This prints informations about the dataset rows.

Syntax and Usage: `print(dat.index)`

Note: no parenthesis necessary.

In [15]:
# My code here
print(dat.index)

RangeIndex(start=0, stop=62, step=1)


**Exercise**: Run the same examples for the dataset `tips`

In [16]:
## Your answers here!
tips.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 8 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   obs      244 non-null    int64  
 1   totbill  244 non-null    float64
 2   tip      244 non-null    float64
 3   sex      244 non-null    object 
 4   smoker   244 non-null    object 
 5   day      244 non-null    object 
 6   time     244 non-null    object 
 7   size     244 non-null    int64  
dtypes: float64(2), int64(2), object(4)
memory usage: 15.4+ KB


In [17]:
tips.head()

Unnamed: 0,obs,totbill,tip,sex,smoker,day,time,size
0,1,16.99,1.01,F,No,Sun,Night,2
1,2,10.34,1.66,M,No,Sun,Night,3
2,3,21.01,3.5,M,No,Sun,Night,3
3,4,23.68,3.31,M,No,Sun,Night,2
4,5,24.59,3.61,F,No,Sun,Night,4


In [18]:
tips.values

array([[1, 16.99, 1.01, ..., 'Sun', 'Night', 2],
       [2, 10.34, 1.66, ..., 'Sun', 'Night', 3],
       [3, 21.01, 3.5, ..., 'Sun', 'Night', 3],
       ...,
       [242, 22.67, 2.0, ..., 'Sat', 'Night', 2],
       [243, 17.82, 1.75, ..., 'Sat', 'Night', 2],
       [244, 18.78, 3.0, ..., 'Thu', 'Night', 2]], dtype=object)

In [19]:
tips.describe()

Unnamed: 0,obs,totbill,tip,size
count,244.0,244.0,244.0,244.0
mean,122.5,19.785943,2.998279,2.569672
std,70.580923,8.902412,1.383638,0.9511
min,1.0,3.07,1.0,1.0
25%,61.75,13.3475,2.0,2.0
50%,122.5,17.795,2.9,2.0
75%,183.25,24.1275,3.5625,3.0
max,244.0,50.81,10.0,6.0


In [20]:
tips.columns

Index(['obs', 'totbill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')

In [21]:
tips.index

RangeIndex(start=0, stop=244, step=1)

## Data Manipulation

### Subsetting variables (columns)

To subset variables the sintax is simple. When it is only one variable:

```
dat["var_name"]
```

When it is two or more, you need to enclose them in a list:

```
dat[["var1", "var2"]]
```

In [22]:
# My code here
dat.head()

Unnamed: 0,country,courts,barb2,prsexp2,prscorr2,gdpw2
0,Argentina,0,-0.720775,1,3,9.69017
1,Australia,1,-6.907755,5,4,10.30484
2,Austria,1,-4.910337,5,4,10.10094
3,Bangladesh,0,0.775975,1,0,8.379768
4,Belgium,1,-4.617344,5,4,10.25012


In [23]:
dat["gdpw2"]

0      9.690170
1     10.304840
2     10.100940
3      8.379768
4     10.250120
        ...    
57    10.127270
58     9.414342
59     9.848820
60     7.726213
61     7.965893
Name: gdpw2, Length: 62, dtype: float64

In [25]:
var = dat["gdpw2"]
print(var)

0      9.690170
1     10.304840
2     10.100940
3      8.379768
4     10.250120
        ...    
57    10.127270
58     9.414342
59     9.848820
60     7.726213
61     7.965893
Name: gdpw2, Length: 62, dtype: float64


In [26]:
dat.head()

Unnamed: 0,country,courts,barb2,prsexp2,prscorr2,gdpw2
0,Argentina,0,-0.720775,1,3,9.69017
1,Australia,1,-6.907755,5,4,10.30484
2,Austria,1,-4.910337,5,4,10.10094
3,Bangladesh,0,0.775975,1,0,8.379768
4,Belgium,1,-4.617344,5,4,10.25012


In [29]:
dat[['country', 'prscorr2', 'gdpw2']]

Unnamed: 0,country,prscorr2,gdpw2
0,Argentina,3,9.690170
1,Australia,4,10.304840
2,Austria,4,10.100940
3,Bangladesh,0,8.379768
4,Belgium,4,10.250120
...,...,...,...
57,United Kingdom,5,10.127270
58,Uruguay,2,9.414342
59,Venezuela,2,9.848820
60,Zambia,1,7.726213


In [30]:
tips['tip']

0      1.01
1      1.66
2      3.50
3      3.31
4      3.61
       ... 
239    5.92
240    2.00
241    2.00
242    1.75
243    3.00
Name: tip, Length: 244, dtype: float64

### Subsetting cases (rows)

Now, to work with cases, notice that pandas allows us to do vectorized operations. For instance:

```
dat["var_name"] > some_number
```

Returns True, if the variable is greater than the number, and False otherwise. To subset the dataset, you need to:

```
dat[dat["var_name"] > some_number]
```

And that's it! For multiple comparisons, the syntax is also easy to use:

```
dat[ (dat["v1"] == "some_value") & (dat["v2"] == "some_other_value") ]
```

And if we want a command similar to `%in%` in R, we can use the `.isin(.)` method:

```
dat[ dat["v1"].isin(["some_value", "some_other_value"]) ]
```

In [31]:
# My code here
dat.head()

Unnamed: 0,country,courts,barb2,prsexp2,prscorr2,gdpw2
0,Argentina,0,-0.720775,1,3,9.69017
1,Australia,1,-6.907755,5,4,10.30484
2,Austria,1,-4.910337,5,4,10.10094
3,Bangladesh,0,0.775975,1,0,8.379768
4,Belgium,1,-4.617344,5,4,10.25012


In [32]:
dat['prsexp2'] > 3

0     False
1      True
2      True
3     False
4      True
      ...  
57     True
58    False
59    False
60    False
61    False
Name: prsexp2, Length: 62, dtype: bool

In [33]:
dat[dat['prsexp2'] > 3]

Unnamed: 0,country,courts,barb2,prsexp2,prscorr2,gdpw2
1,Australia,1,-6.907755,5,4,10.30484
2,Austria,1,-4.910337,5,4,10.10094
4,Belgium,1,-4.617344,5,4,10.25012
6,Botswana,1,-1.244868,4,3,8.77771
7,Brazil,1,-0.457034,4,3,9.375601
10,Canada,1,-6.907755,5,5,10.41018
15,Cote d'Ivoire,1,-4.229065,4,2,8.228711
16,Denmark,1,-6.907755,5,5,10.10651
19,Finland,1,-6.907755,5,5,10.12367
20,"Gambia, The",0,-1.543332,4,2,7.501082


In [35]:
print(dat.head())
(dat['prsexp2'] > 3) & (dat['courts'] == 1)

      country  courts     barb2  prsexp2  prscorr2      gdpw2
0   Argentina       0 -0.720775        1         3   9.690170
1   Australia       1 -6.907755        5         4  10.304840
2     Austria       1 -4.910337        5         4  10.100940
3  Bangladesh       0  0.775975        1         0   8.379768
4     Belgium       1 -4.617344        5         4  10.250120


0     False
1      True
2      True
3     False
4      True
      ...  
57     True
58    False
59    False
60    False
61    False
Length: 62, dtype: bool

In [36]:
dat[(dat['prsexp2'] > 3) & (dat['courts'] == 1)]

Unnamed: 0,country,courts,barb2,prsexp2,prscorr2,gdpw2
1,Australia,1,-6.907755,5,4,10.30484
2,Austria,1,-4.910337,5,4,10.10094
4,Belgium,1,-4.617344,5,4,10.25012
6,Botswana,1,-1.244868,4,3,8.77771
7,Brazil,1,-0.457034,4,3,9.375601
10,Canada,1,-6.907755,5,5,10.41018
15,Cote d'Ivoire,1,-4.229065,4,2,8.228711
16,Denmark,1,-6.907755,5,5,10.10651
19,Finland,1,-6.907755,5,5,10.12367
23,Hungary,1,-0.904194,4,3,9.35184


In [37]:
dat['country'].isin(['Brazil', 'Canada', 'Finland'])

0     False
1     False
2     False
3     False
4     False
      ...  
57    False
58    False
59    False
60    False
61    False
Name: country, Length: 62, dtype: bool

In [38]:
dat[dat['country'].isin(['Brazil', 'Canada', 'Finland'])]

Unnamed: 0,country,courts,barb2,prsexp2,prscorr2,gdpw2
7,Brazil,1,-0.457034,4,3,9.375601
10,Canada,1,-6.907755,5,5,10.41018
19,Finland,1,-6.907755,5,5,10.12367


**Exercise**: Filter the `tips` dataset by:

1. Bills of more than 10 dollars
2. Smokers
3. Weekend

Do each of these separately, then do all together.

In [39]:
## Your answers here!
tips.head()

Unnamed: 0,obs,totbill,tip,sex,smoker,day,time,size
0,1,16.99,1.01,F,No,Sun,Night,2
1,2,10.34,1.66,M,No,Sun,Night,3
2,3,21.01,3.5,M,No,Sun,Night,3
3,4,23.68,3.31,M,No,Sun,Night,2
4,5,24.59,3.61,F,No,Sun,Night,4


In [40]:
more_10_bill = tips['totbill'] > 10
smoker = tips['smoker'] == 'Yes'
weeknd = tips['day'].isin(['Sat', 'Sun'])
print(more_10_bill)
print(smoker)
print(weeknd)

0      True
1      True
2      True
3      True
4      True
       ... 
239    True
240    True
241    True
242    True
243    True
Name: totbill, Length: 244, dtype: bool
0      False
1      False
2      False
3      False
4      False
       ...  
239    False
240     True
241     True
242    False
243    False
Name: smoker, Length: 244, dtype: bool
0       True
1       True
2       True
3       True
4       True
       ...  
239     True
240     True
241     True
242     True
243    False
Name: day, Length: 244, dtype: bool


In [43]:
tips[more_10_bill]

Unnamed: 0,obs,totbill,tip,sex,smoker,day,time,size
0,1,16.99,1.01,F,No,Sun,Night,2
1,2,10.34,1.66,M,No,Sun,Night,3
2,3,21.01,3.50,M,No,Sun,Night,3
3,4,23.68,3.31,M,No,Sun,Night,2
4,5,24.59,3.61,F,No,Sun,Night,4
...,...,...,...,...,...,...,...,...
239,240,29.03,5.92,M,No,Sat,Night,3
240,241,27.18,2.00,F,Yes,Sat,Night,2
241,242,22.67,2.00,M,Yes,Sat,Night,2
242,243,17.82,1.75,M,No,Sat,Night,2


In [44]:
tips[smoker]

Unnamed: 0,obs,totbill,tip,sex,smoker,day,time,size
56,57,38.01,3.00,M,Yes,Sat,Night,4
58,59,11.24,1.76,M,Yes,Sat,Night,2
60,61,20.29,3.21,M,Yes,Sat,Night,2
61,62,13.81,2.00,M,Yes,Sat,Night,2
62,63,11.02,1.98,M,Yes,Sat,Night,2
...,...,...,...,...,...,...,...,...
234,235,15.53,3.00,M,Yes,Sat,Night,2
236,237,12.60,1.00,M,Yes,Sat,Night,2
237,238,32.83,1.17,M,Yes,Sat,Night,2
240,241,27.18,2.00,F,Yes,Sat,Night,2


In [45]:
tips[weeknd]

Unnamed: 0,obs,totbill,tip,sex,smoker,day,time,size
0,1,16.99,1.01,F,No,Sun,Night,2
1,2,10.34,1.66,M,No,Sun,Night,3
2,3,21.01,3.50,M,No,Sun,Night,3
3,4,23.68,3.31,M,No,Sun,Night,2
4,5,24.59,3.61,F,No,Sun,Night,4
...,...,...,...,...,...,...,...,...
238,239,35.83,4.67,F,No,Sat,Night,3
239,240,29.03,5.92,M,No,Sat,Night,3
240,241,27.18,2.00,F,Yes,Sat,Night,2
241,242,22.67,2.00,M,Yes,Sat,Night,2


In [46]:
tips[more_10_bill & smoker & weeknd]

Unnamed: 0,obs,totbill,tip,sex,smoker,day,time,size
56,57,38.01,3.0,M,Yes,Sat,Night,4
58,59,11.24,1.76,M,Yes,Sat,Night,2
60,61,20.29,3.21,M,Yes,Sat,Night,2
61,62,13.81,2.0,M,Yes,Sat,Night,2
62,63,11.02,1.98,M,Yes,Sat,Night,2
63,64,18.29,3.76,M,Yes,Sat,Night,4
69,70,15.01,2.09,M,Yes,Sat,Night,2
72,73,26.86,3.14,F,Yes,Sat,Night,2
73,74,25.28,5.0,F,Yes,Sat,Night,2
76,77,17.92,3.08,M,Yes,Sat,Night,2


### Simple computations

It is simple to create new variables from older ones.

```
# Summing two variables
dat["my_new_var"] = dat["my_old_var1"] + dat["my_old_var2"]

# Multiplying by a constant
dat["my_new_var"] = dat["my_old_var1"] * constant

# Apply some numpy function (try to always use numpy functions, as pandas is based on numpy)
import numpy as np
dat["my_new_logged_var"] = np.log(dat["my_old_var"])
```

In [47]:
# My code here
dat.head()

Unnamed: 0,country,courts,barb2,prsexp2,prscorr2,gdpw2
0,Argentina,0,-0.720775,1,3,9.69017
1,Australia,1,-6.907755,5,4,10.30484
2,Austria,1,-4.910337,5,4,10.10094
3,Bangladesh,0,0.775975,1,0,8.379768
4,Belgium,1,-4.617344,5,4,10.25012


In [50]:
dat['risk'] = 10 - (dat['prsexp2'] + dat['prscorr2'])
dat.head()

Unnamed: 0,country,courts,barb2,prsexp2,prscorr2,gdpw2,risk
0,Argentina,0,-0.720775,1,3,9.69017,6
1,Australia,1,-6.907755,5,4,10.30484,1
2,Austria,1,-4.910337,5,4,10.10094,1
3,Bangladesh,0,0.775975,1,0,8.379768,9
4,Belgium,1,-4.617344,5,4,10.25012,1


In [51]:
import numpy as np
dat['gdppc'] = np.exp(dat['gdpw2'])
dat.head()

Unnamed: 0,country,courts,barb2,prsexp2,prscorr2,gdpw2,risk,gdppc
0,Argentina,0,-0.720775,1,3,9.69017,6,16157.990919
1,Australia,1,-6.907755,5,4,10.30484,1,29876.873543
2,Austria,1,-4.910337,5,4,10.10094,1,24365.902611
3,Bangladesh,0,0.775975,1,0,8.379768,9,4357.997753
4,Belgium,1,-4.617344,5,4,10.25012,1,28285.936029


**Exercise**: In the `tips` dataset, create the variable `prop_tip`, which is the proportion of the tip with relation to the total bill.

In [52]:
## Your answers here!
tips['prop_tip'] = tips['tip'] / tips['totbill']
tips.describe()

Unnamed: 0,obs,totbill,tip,size,prop_tip
count,244.0,244.0,244.0,244.0,244.0
mean,122.5,19.785943,2.998279,2.569672,0.160803
std,70.580923,8.902412,1.383638,0.9511,0.061072
min,1.0,3.07,1.0,1.0,0.035638
25%,61.75,13.3475,2.0,2.0,0.129127
50%,122.5,17.795,2.9,2.0,0.15477
75%,183.25,24.1275,3.5625,3.0,0.191475
max,244.0,50.81,10.0,6.0,0.710345


## Statistics

We can easily compute statistics from the data. Here are a few methods that we have available:

| Method           | Description                  |
|------------------|------------------------------|
| `.median()`      | Median                       |
| `.mean()`        | Mean                         |
| `.min()`         | Minimum                      |
| `.max()`         | Maximum                      |
| `.var()`         | Variance                     |
| `.std()`         | Standard Deviation           |
| `.sum()`         | Sum values                   |
| `.mode()`        | More frequent values         |
| `.quantile(val)` | Quantile value (btw 0 and 1) |

In [53]:
# My code here
dat.head()

Unnamed: 0,country,courts,barb2,prsexp2,prscorr2,gdpw2,risk,gdppc
0,Argentina,0,-0.720775,1,3,9.69017,6,16157.990919
1,Australia,1,-6.907755,5,4,10.30484,1,29876.873543
2,Austria,1,-4.910337,5,4,10.10094,1,24365.902611
3,Bangladesh,0,0.775975,1,0,8.379768,9,4357.997753
4,Belgium,1,-4.617344,5,4,10.25012,1,28285.936029


In [55]:
dat['gdppc'].mean()

12378.127150748098

In [56]:
dat['gdppc'].median()

9754.001199015076

In [58]:
dat['prsexp2'].mode()

0    3
dtype: int64

In [60]:
dat['gdppc'].quantile(0.25)

4363.4983113830685

In [61]:
dat['gdppc'].quantile(0.90)

27436.642085297528

In [62]:
dat['gdppc'].quantile(0.10)

2052.199573919149

**Exercise**: For the `tips` dataset:

1. Compute the mean and median of tip
2. Compute the mode of day
3. Compute the first quartile of the totbill.

In [63]:
## Your answers here!
print('The mean of tip is: ' + str(tips['tip'].mean()))

The mean of tip is: 2.9982786885245902


In [64]:
print(tips['tip'].median())

2.9


In [65]:
print(tips['day'].mode())

0    Sat
dtype: object


In [66]:
print(tips['totbill'].quantile(0.25))

13.3475


**Great job!!!**