# Pandas Data Structure

Pandas in Python deals with three data structures

(1) Series - A one-dimensional labeled array capable of holding any data type 

(2) Data Frames – 2 dimensional labeled, size-mutable tabular structure with heterogenic columns

(3) Panel – 3 dimensional labeled size mutable array


# Series

Series - Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is:

s = pd.Series(data, index=index)

Here, data can be many different things:

a Python dict
an ndarray
a scalar value (like 2)

The passed index is a list of axis labels. Thus, this separates into a few cases depending on what data is

If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1].



In [2]:
#To get started, import numpy and load pandas into your namespace:

import numpy as np
import pandas as pd

s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
s

a   -1.457993
b    0.545965
c    0.894790
d    1.326906
e   -0.926404
dtype: float64


The Series constructor can convert a dictonary as well, using the keys of the dictionary as its index.


In [3]:
d = {'mumbai': 2000, 'Delhi': 1300, 'Paris': 900, 'San Francisco': 2100,
     'Austin': 450, 'London': None}
cities = pd.Series(d)
cities

Austin            450.0
Delhi            1300.0
London              NaN
Paris             900.0
San Francisco    2100.0
mumbai           2000.0
dtype: float64

Key Features of a Series:

* Homogeneous data
* Size Immutable –size cannot be changed
* Values of Data Mutable


In [8]:
#Accessing data from series with position:

#Retrieve the first element. As we already know, the counting starts from zero for the array,
#which means the first element is stored at zeroth position and so on.

data = np.array(['a','b','c','d','e','f'])
s = pd.Series(data)
s[0]

'a'

In [9]:
data = np.array(['a','b','c','d','e','f'])
s = pd.Series(data)

s[:3]

0    a
1    b
2    c
dtype: object

In [10]:
#Accessing data from series with Labels or index:

data = np.array(['a','b','c','d','e','f'])

s = pd.Series(data,index=[100,101,102,103,104,105])

s[102]

'c'

## DataFrame in pandas:

DataFrame is a two-dimensional array with heterogeneous data, usually represented in the tabular format. The data is represented in rows and columns.

DataFrame accepts many different kinds of input:

Dict of 1D ndarrays, lists, dicts, or Series
2-D numpy.ndarray
Structured or record ndarray
A Series
Another DataFrame



In [4]:
data = {'Country': ['Belgium',  'India',  'Brazil'],'Capital': ['Brussels',  'New Delhi',  'Brasilia'],
'Population': [11190846, 1303171035, 207847528]}

df = pd.DataFrame(data,columns=['Country',  'Capital',  'Population'])

df

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,11190846
1,India,New Delhi,1303171035
2,Brazil,Brasilia,207847528


In [5]:
# From dict of ndarrays / lists

d = {'one' : [1., 2., 3., 4.],'two' : [4., 3., 2., 1.]}

df =  pd.DataFrame(d)
df


Unnamed: 0,one,two
0,1.0,4.0
1,2.0,3.0
2,3.0,2.0
3,4.0,1.0


In [6]:
pd.DataFrame(d, index=['a', 'b', 'c', 'd'])

Unnamed: 0,one,two
a,1.0,4.0
b,2.0,3.0
c,3.0,2.0
d,4.0,1.0


In [11]:
# From a list of dicts
data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
pd.DataFrame(data2)


Unnamed: 0,a,b,c
0,1,2,
1,5,10,20.0


# Rename the column of dataframe in pandas

In [13]:
# data frame 1
d1 = {'Customer_id':pd.Series([1,2,3,4,5,6]),
'Product':pd.Series(['Oven','Oven','Oven','Television','Television','Television']),
      'State':pd.Series(['California','Texas','Georgia','Florida','Albama','virginia'])}
df1 = pd.DataFrame(d1)

df1

Unnamed: 0,Customer_id,Product,State
0,1,Oven,California
1,2,Oven,Texas
2,3,Oven,Georgia
3,4,Television,Florida
4,5,Television,Albama
5,6,Television,virginia


In [15]:
#Rename all the column names 

df1.columns = ['Customer_unique_id', 'Product_type', 'Province']

df1

Unnamed: 0,Customer_unique_id,Product_type,Province
0,1,Oven,California
1,2,Oven,Texas
2,3,Oven,Georgia
3,4,Television,Florida
4,5,Television,Albama
5,6,Television,virginia


In [16]:
# Rename the specific column value by index 
#rename the first column

df1.columns.values[0] = "customer"
df1

Unnamed: 0,customer,Product_type,Province
0,1,Oven,California
1,2,Oven,Texas
2,3,Oven,Georgia
3,4,Television,Florida
4,5,Television,Albama
5,6,Television,virginia
