# pandas (Python)

pandas will be a major tool of interest. It contains data structures and data manipulation tools designed to make data cleaning
and analysis fast and easy in Python. pandas is often used in tandem with numerical
computing tools like NumPy and SciPy, analytical libraries like statsmodels and
scikit-learn, and data visualization libraries like matplotlib. pandas adopts significant
parts of NumPy’s idiomatic style of array-based computing, especially array-based
functions and a preference for data processing without for loops.

While pandas adopts many coding idioms from NumPy, the biggest difference is that
pandas is designed for working with tabular or heterogeneous data. NumPy, by contrast,
is best suited for working with homogeneous numerical array data.

In [1]:
import pandas as pd
from pandas import Series,DataFrame

# Introduction to pandas Data Structures

Series: A Series is a one-dimensional array-like object containing a sequence of values (of
similar types to NumPy types) and an associated array of data labels, called its index.

In [2]:
obj = pd.Series([4, 7, -5, 3])
obj

0    4
1    7
2   -5
3    3
dtype: int64

In [3]:
print(obj.values)
print(obj.index) # like range(4)

[ 4  7 -5  3]
RangeIndex(start=0, stop=4, step=1)


In [4]:
obj2 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
obj2

d    4
b    7
a   -5
c    3
dtype: int64

In [5]:
print(obj2['a'])
print(obj2)
obj2['d'] = 6
print(obj2)
print(obj2[['c', 'a', 'd']])

-5
d    4
b    7
a   -5
c    3
dtype: int64
d    6
b    7
a   -5
c    3
dtype: int64
c    3
a   -5
d    6
dtype: int64


In [6]:
# Some opeartions
obj2[obj2 > 0]

d    6
b    7
c    3
dtype: int64

In [7]:
obj2 * 2

d    12
b    14
a   -10
c     6
dtype: int64

In [8]:
import numpy as np
np.exp(obj2)

d     403.428793
b    1096.633158
a       0.006738
c      20.085537
dtype: float64

Another way to think about a Series is as a fixed-length, ordered dict, as it is a mapping
of index values to data values

In [9]:
obj2['b' in obj2]

7

In [10]:
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj3 = pd.Series(sdata)
obj3

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

In [11]:
states = ['California', 'Ohio', 'Oregon', 'Texas']
obj4 = pd.Series(sdata, index=states)  # NaN missing or NA values
obj4

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64

In [13]:
import pandas as pd
print(pd.isnull(obj4))

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool
