# IPython & Jupyter: a PyCon 2017 tutorial

In [1]:
import pandas as pd

Class outline:

* A quick installation check of [ipython](https://ipython.org/install.html) and [jupyter notebook](https://jupyter.readthedocs.io/en/latest/install.html)
* An overview of the IPython project from [the official website](http://ipython.org), and [jupyter](https://jupyter.org)
* Super basic intro to the notebook: typing code.
* [Notebook Basics](examples/Notebook/Notebook%20Basics.ipynb)
* [IPython - beyond plain python](examples/IPython%20Kernel/Beyond%20Plain%20Python.ipynb)
* [Markdown Cells](examples/Notebook/Working%20With%20Markdown%20Cells.ipynb)
* [Rich Display System](examples/IPython%20Kernel/Rich%20Output.ipynb)
* [Custom Display logic](examples/IPython%20Kernel/Custom%20Display%20Logic.ipynb)
* [Customizing IPython - a condensed version](exercises/Customization/Condensed.ipynb)
* [Running a Secure Public Notebook Server](examples/Notebook/Running%20the%20Notebook%20Server.ipynb#Securing-the-notebook-server)
* [How Jupyter/IPython works](examples/Notebook/Multiple%20Languages%2C%20Frontends.ipynb) to run code in different languages.

In [2]:
df = pd.read_csv("imports-85.data", header=None)

Get this tutorial:

    git clone https://github.com/ipython/ipython-in-depth

Install IPython and Jupyter:

with [conda](https://www.continuum.io/downloads):

    conda install ipython jupyter

with pip:

    # first, upgrade pip!
    pip install --upgrade pip
    pip install --upgrade ipython jupyter

Start the notebook in the tutorial directory:

    cd ipython-in-depth
    jupyter notebook

In [3]:
print("The first 3 rows of the dataframe")
df.head(3)

The first 3 rows of the dataframe


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,16,17,18,19,20,21,22,23,24,25
0,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495
1,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500
2,1,?,alfa-romero,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500


There are a lot more detailed notebooks in this same directory that cover other topics, but we can not cover all in a 3-hour tutorial. We encourage you to explore them and practice on your own.

In [4]:
df.shape

(205, 26)

In [5]:
df.tail(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,16,17,18,19,20,21,22,23,24,25
202,-1,95,volvo,gas,std,four,sedan,rwd,front,109.1,...,173,mpfi,3.58,2.87,8.8,134,5500,18,23,21485
203,-1,95,volvo,diesel,turbo,four,sedan,rwd,front,109.1,...,145,idi,3.01,3.4,23.0,106,4800,26,27,22470
204,-1,95,volvo,gas,turbo,four,sedan,rwd,front,109.1,...,141,mpfi,3.78,3.15,9.5,114,5400,19,25,22625


In [6]:
df.sample(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,16,17,18,19,20,21,22,23,24,25
157,0,91,toyota,gas,std,four,hatchback,fwd,front,95.7,...,98,2bbl,3.19,3.03,9.0,70,4800,30,37,7198
129,1,?,porsche,gas,std,two,hatchback,rwd,front,98.4,...,203,mpfi,3.94,3.11,10.0,288,5750,17,28,?
51,1,104,mazda,gas,std,two,hatchback,fwd,front,93.1,...,91,2bbl,3.03,3.15,9.0,68,5000,31,38,6095


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
0     205 non-null int64
1     205 non-null object
2     205 non-null object
3     205 non-null object
4     205 non-null object
5     205 non-null object
6     205 non-null object
7     205 non-null object
8     205 non-null object
9     205 non-null float64
10    205 non-null float64
11    205 non-null float64
12    205 non-null float64
13    205 non-null int64
14    205 non-null object
15    205 non-null object
16    205 non-null int64
17    205 non-null object
18    205 non-null object
19    205 non-null object
20    205 non-null float64
21    205 non-null object
22    205 non-null object
23    205 non-null int64
24    205 non-null int64
25    205 non-null object
dtypes: float64(5), int64(5), object(16)
memory usage: 41.8+ KB


In [8]:
df[23].describe()

count    205.000000
mean      25.219512
std        6.542142
min       13.000000
25%       19.000000
50%       24.000000
75%       30.000000
max       49.000000
Name: 23, dtype: float64

In [9]:
df[23].value_counts()

31    28
19    27
24    22
27    14
17    13
26    12
23    12
21     8
30     8
25     8
38     7
28     7
37     6
16     6
22     4
15     3
18     3
29     3
20     3
14     2
49     1
47     1
32     1
33     1
34     1
35     1
36     1
45     1
13     1
Name: 23, dtype: int64

In [10]:
df[2].describe()

count        205
unique        22
top       toyota
freq          32
Name: 2, dtype: object

In [11]:
df[2].value_counts()

toyota           32
nissan           18
mazda            17
mitsubishi       13
honda            13
volkswagen       12
subaru           12
peugot           11
volvo            11
dodge             9
mercedes-benz     8
bmw               8
plymouth          7
audi              7
saab              6
porsche           5
isuzu             4
jaguar            3
alfa-romero       3
chevrolet         3
renault           2
mercury           1
Name: 2, dtype: int64

In [12]:
df[5].value_counts()

four    114
two      89
?         2
Name: 5, dtype: int64

In [13]:
df[1].value_counts()

?      41
161    11
91      8
150     7
104     6
134     6
128     6
95      5
94      5
85      5
74      5
168     5
65      5
103     5
102     5
122     4
93      4
106     4
118     4
148     4
154     3
137     3
115     3
83      3
125     3
101     3
164     2
119     2
113     2
110     2
197     2
194     2
108     2
129     2
89      2
145     2
87      2
81      2
188     2
153     2
192     2
158     2
231     1
90      1
78      1
107     1
256     1
186     1
121     1
142     1
77      1
98      1
Name: 1, dtype: int64

In [14]:
for i in range(len(df.columns)):
    if (df[i].dtypes in ['int64','float64']):
        print('\nAttribute-',i,':',df[i].dtypes)
        Q1=df[i].quantile(0.25)
        print('Q1',Q1)
        Q3=df[i].quantile(0.75)
        print('Q3',Q3)
        IQR=Q3-Q1
        print('IQR',IQR)
        min=df[i].min()
        max=df[i].max()
        min_IQR=Q1-1.5*IQR
        max_IQR=Q3+1.5*IQR
        if (min<min_IQR):
            print('Low outlier is found')
        if (max>max_IQR):
            print('High outlier is found')


Attribute- 0 : int64
Q1 0.0
Q3 2.0
IQR 2.0

Attribute- 9 : float64
Q1 94.5
Q3 102.4
IQR 7.900000000000006
High outlier is found

Attribute- 10 : float64
Q1 166.3
Q3 183.1
IQR 16.799999999999983
Low outlier is found

Attribute- 11 : float64
Q1 64.1
Q3 66.9
IQR 2.8000000000000114
High outlier is found

Attribute- 12 : float64
Q1 52.0
Q3 55.5
IQR 3.5

Attribute- 13 : int64
Q1 2145.0
Q3 2935.0
IQR 790.0

Attribute- 16 : int64
Q1 97.0
Q3 141.0
IQR 44.0
High outlier is found

Attribute- 20 : float64
Q1 8.6
Q3 9.4
IQR 0.8000000000000007
Low outlier is found
High outlier is found

Attribute- 23 : int64
Q1 19.0
Q3 30.0
IQR 11.0
High outlier is found

Attribute- 24 : int64
Q1 25.0
Q3 34.0
IQR 9.0
High outlier is found
