# Pandas for Data Analysis: Data Structures & Types

**Outline:**

* [Pandas Data Structures](#Pandas-Data-Structures)
  * [Python List](#Python-List)
  * [Series](#Series)
  * [DataFrame](#DataFrame)
* [Pandas Data Types](#Pandas-Data-Types)

In [2]:
import pandas as pd

## Pandas Data Structures

### Python List

In [3]:
data = [113, 1463, 95, 33]
data[2]

95

### Series

In [4]:
pd.Series()

Series([], dtype: float64)

In [5]:
series_data = pd.Series([113, 1463, 95, 33])
series_data

0     113
1    1463
2      95
3      33
dtype: int64

In [6]:
type(series_data)

pandas.core.series.Series

In [7]:
series_data[2]

95

In [8]:
series_data = pd.Series({'a': 113, 'b': 1463, 'c': 95, 'd': 33})
series_data

a     113
b    1463
c      95
d      33
dtype: int64

In [9]:
series_data[1]

1463

In [10]:
series_data['b']

1463

In [11]:
series_data = pd.Series({'a': 113, 'b': 1463, 'c': 95, 'd': 33}, index=['b', 'c', 'd', 'e', 'f'])
series_data

b    1463.0
c      95.0
d      33.0
e       NaN
f       NaN
dtype: float64

In [12]:
series_data.isnull()

b    False
c    False
d    False
e     True
f     True
dtype: bool

In [13]:
series_data.isnull().sum()

2

In [14]:
series_data.index

Index(['b', 'c', 'd', 'e', 'f'], dtype='object')

In [15]:
series_data.values

array([ 1463.,    95.,    33.,    nan,    nan])

In [16]:
[1, 2, 3] + [3, 4, 6]

[1, 2, 3, 3, 4, 6]

In [17]:
series_data + series_data

b    2926.0
c     190.0
d      66.0
e       NaN
f       NaN
dtype: float64

In [18]:
series_data.append(pd.Series([113, 1463, 95, 33]))

b    1463.0
c      95.0
d      33.0
e       NaN
f       NaN
0     113.0
1    1463.0
2      95.0
3      33.0
dtype: float64

In [19]:
series_data = series_data.append(pd.Series({'b': 99}))

In [20]:
series_data.index

Index(['b', 'c', 'd', 'e', 'f', 'b'], dtype='object')

In [21]:
series_data

b    1463.0
c      95.0
d      33.0
e       NaN
f       NaN
b      99.0
dtype: float64

In [22]:
series_data[5]

99.0

In [23]:
series_data['b']

b    1463.0
b      99.0
dtype: float64

### DataFrame

In [24]:
personal_data_dict = {
    'age': [39, 50, 38],
    'education': ['Bachelors', 'Bachelors', 'HS-grad'],
    'occupation': ['Adm-clerical', 'Tech-support', 'Sales'],
    'sex': ['Male', 'Female', 'Female'],
    'capital-gain': [2174, 111, 993]
}
df = pd.DataFrame(personal_data_dict)

In [25]:
df

Unnamed: 0,age,capital-gain,education,occupation,sex
0,39,2174,Bachelors,Adm-clerical,Male
1,50,111,Bachelors,Tech-support,Female
2,38,993,HS-grad,Sales,Female


In [26]:
type(df)

pandas.core.frame.DataFrame

In [27]:
df.shape

(3, 5)

In [28]:
df.index

RangeIndex(start=0, stop=3, step=1)

In [29]:
df.values

array([[39, 2174, 'Bachelors', 'Adm-clerical', 'Male'],
       [50, 111, 'Bachelors', 'Tech-support', 'Female'],
       [38, 993, 'HS-grad', 'Sales', 'Female']], dtype=object)

In [30]:
df.columns

Index(['age', 'capital-gain', 'education', 'occupation', 'sex'], dtype='object')

In [31]:
df.head(2)

Unnamed: 0,age,capital-gain,education,occupation,sex
0,39,2174,Bachelors,Adm-clerical,Male
1,50,111,Bachelors,Tech-support,Female


In [32]:
df.tail()

Unnamed: 0,age,capital-gain,education,occupation,sex
0,39,2174,Bachelors,Adm-clerical,Male
1,50,111,Bachelors,Tech-support,Female
2,38,993,HS-grad,Sales,Female


In [33]:
df['occupation']

0    Adm-clerical
1    Tech-support
2           Sales
Name: occupation, dtype: object

In [34]:
df['age'][1]

50

In [35]:
df['capital-gain']

0    2174
1     111
2     993
Name: capital-gain, dtype: int64

In [36]:
df['name']

KeyError: 'name'

In [37]:
df['age']

0    39
1    50
2    38
Name: age, dtype: int64

In [39]:
df.age.value_counts()

39    1
50    1
38    1
Name: age, dtype: int64

In [None]:
df.age.value_counts()

In [40]:
type(df.age)

pandas.core.series.Series

## Pandas Data Types

In [41]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
age             3 non-null int64
capital-gain    3 non-null int64
education       3 non-null object
occupation      3 non-null object
sex             3 non-null object
dtypes: int64(2), object(3)
memory usage: 200.0+ bytes
