## Pandas DataFrame
A DataFrame is a two-dimensional data structure, which can be visualized in a tabular form. The two axes, called **rows** and **columns**, of DataFrame are labeled.

A DataFrame is most suitable for storing multivariate statistical data, with columns representing variables and rows representing cases (observations).
* The data contained in within each column is homogeneous. That is, all values in a column has same dtype, called the dtype of the column.
* Rows are labeled by ‘Index’ and columns are labeled by 'columns'.
* Different columns can be of different dtypes.

## Creating a DataFrame


### Using List of Lists

In [2]:
from pandas import Series, DataFrame 
import numpy as np
df1 = DataFrame([[1.2, 12.5, None],[1.7,12.7, None],[1.3, 11.9, None],[1.6, 13.1, 6.2]])
df1

Unnamed: 0,0,1,2
0,1.2,12.5,
1,1.7,12.7,
2,1.3,11.9,
3,1.6,13.1,6.2


Note that the index values for both the dimensions are created automatically. 

The following statement explicitely specifies these indices 

In [3]:
from pandas import Series, DataFrame 
df2 = DataFrame([[1.2, 12.5],[1.7,12.7],[1.3, 11.9],[1.6, 13.1]],
              index = ['a', 'b', 'c', 'd'], columns = ['X', 'Y'])
df2

Unnamed: 0,X,Y
a,1.2,12.5
b,1.7,12.7
c,1.3,11.9
d,1.6,13.1


In [3]:
df2.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [4]:
type(df2.index)

pandas.core.indexes.base.Index

In [5]:
df2.columns

Index(['X', 'Y'], dtype='object')

In [6]:
type(df2.columns)

pandas.core.indexes.base.Index

Index values and column names are available as attributes for the DataFrame.

In [7]:
df2.X

a    1.2
b    1.7
c    1.3
d    1.6
Name: X, dtype: float64

In [8]:
df2['X']

a    1.2
b    1.7
c    1.3
d    1.6
Name: X, dtype: float64

In [9]:
df3 = df2
df3.columns = ['X Value', 'Y Value']
df3

Unnamed: 0,X Value,Y Value
a,1.2,12.5
b,1.7,12.7
c,1.3,11.9
d,1.6,13.1


In [10]:
df2

Unnamed: 0,X Value,Y Value
a,1.2,12.5
b,1.7,12.7
c,1.3,11.9
d,1.6,13.1


In [6]:
df22 = df2.copy()

In [8]:
df22.columns = ['U', 'V']
df22

Unnamed: 0,U,V
a,1.2,12.5
b,1.7,12.7
c,1.3,11.9
d,1.6,13.1


In [9]:
df2

Unnamed: 0,X,Y
a,1.2,12.5
b,1.7,12.7
c,1.3,11.9
d,1.6,13.1


In [11]:
df3['X Value']

a    1.2
b    1.7
c    1.3
d    1.6
Name: X Value, dtype: float64

In [12]:
df4 = df2.copy()
df4.columns = ['X', 'Y']
df4

Unnamed: 0,X,Y
a,1.2,12.5
b,1.7,12.7
c,1.3,11.9
d,1.6,13.1


In [13]:
df2

Unnamed: 0,X Value,Y Value
a,1.2,12.5
b,1.7,12.7
c,1.3,11.9
d,1.6,13.1


In [14]:
df2.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [15]:
df2.columns

Index(['X Value', 'Y Value'], dtype='object')

### Using List of Series
Suppose x and y are the two series each containing univariate data on variables X and Y. These two series can be combined in a DataFrame. 

In [16]:
x1 = Series([1.2, 1.7, 1.3, 1.6], index = ['a', 'b', 'c', 'd'])
y1 = Series([12.5, 12.7, 11.9, 13.1], index = ['a', 'b', 'c', 'd'])
df3 = DataFrame([x1, y1])
df3

Unnamed: 0,a,b,c,d
0,1.2,1.7,1.3,1.6
1,12.5,12.7,11.9,13.1


In [17]:
x2 = Series([1.2, 1.7, 1.3, 1.6], index = ['a', 'b', 'd', 'f'])
y2 = Series([12.5, 12.7, 11.9, 13.1], index = ['a', 'b', 'c', 'e']) # index for y will be same as index for x
df4 = DataFrame([x2, y2])
df4

Unnamed: 0,a,b,d,f,c,e
0,1.2,1.7,1.3,1.6,,
1,12.5,12.7,,,11.9,13.1


Note that 
* Input Series are used as rows
* The index of the input Series are used as columns
* Rows are automatically indexed, as we did not specify index.

The following statement produces more desirable DataFrame

In [18]:
df4 = DataFrame([x2, y2], index = ['X', 'Y']).T
df4

Unnamed: 0,X,Y
a,1.2,12.5
b,1.7,12.7
d,1.3,
f,1.6,
c,,11.9
e,,13.1


Note that the attribute T is used to transpose the resultant DataFrame, so that the Series x and y are be organized as columns instead of rows.

### List of dicts
DataFrame can also be created using list of dictionaries. The behavior in this case is _almost same_ as while creating it using list of Series. 

In [19]:
x = dict(a = 1.2, b = 1.7, c = 1.3, d = 1.6)
y = dict(a = 12.5, b = 12.7, c = 11.9, d = 13.1)
df5 = DataFrame([x, y], index = ['X', 'Y']).T
df5

Unnamed: 0,X,Y
a,1.2,12.5
b,1.7,12.7
c,1.3,11.9
d,1.6,13.1


This similar behavior is as expected. Recall that, a Series can be treated as dictionary with index values as the keys.

### Using dict of lists
This is the most common way for creating a DataFrame.

In [20]:
data = dict(X = [1.2, 1.7, 1.3, 1.6], Y = [12.5, 12.7, 11.9, 13.1])
DataFrame(data)

Unnamed: 0,X,Y
0,1.2,12.5
1,1.7,12.7
2,1.3,11.9
3,1.6,13.1


Note that there was no need to transpose the DataFrame as we did while creating it from list.
### Using dict of Series

In [21]:
x = Series([1.2, 1.7, 1.3, 1.6], index = ['a', 'b', 'c', 'd'])
y = Series([12.5, 12.7, 11.9, 13.1], index = x.index) # index for y will be same as index for x
data = {'X': x, 'Y': y}
DataFrame(data)

Unnamed: 0,X,Y
a,1.2,12.5
b,1.7,12.7
c,1.3,11.9
d,1.6,13.1


Note that the index values for the resultant DataFrame are taken from the index values of x and y.

### Using dict of dicts

In [22]:
x = dict(a = 1.2, b = 1.7, c = 1.3, d = 1.6)
y = dict(a = 12.5, b = 12.7, c = 11.9, d = 13.1)
DataFrame({'X':x, 'Y':y})

Unnamed: 0,X,Y
a,1.2,12.5
b,1.7,12.7
c,1.3,11.9
d,1.6,13.1


Note that the result is similar to that in case of dict of Series, as expected

### Using numpy array

In [23]:
import numpy as np
data = np.array([[10, 11, 12, 13, 14, 15], np.random.randn(6)]).T
DataFrame(data, index = ['a', 'b', 'c', 'd', 'e', 'f'], columns = ['X', 'Y'])

Unnamed: 0,X,Y
a,10.0,-0.083457
b,11.0,-0.620211
c,12.0,-0.677445
d,13.0,-0.926922
e,14.0,-0.100967
f,15.0,1.434508


Here, the input array of shape (6,2) is transformed into a DataFrame containing 6 rows and 2 columns.

## Using dict of Objects

In [24]:
import pandas as pd
df6 = DataFrame({
  'U': 1.,
  'V': pd.Timestamp('20130102'),
  'W': Series(1, index=['a', 'b', 'c', 'd'], dtype='float32'),
  'X': np.array([3, 5] * 2, dtype='int32'),
  'Y': ["test", "train", "test", "train"],
  'Z': 'foo'
})
df6

Unnamed: 0,U,V,W,X,Y,Z
a,1.0,2013-01-02,1.0,3,test,foo
b,1.0,2013-01-02,1.0,5,train,foo
c,1.0,2013-01-02,1.0,3,test,foo
d,1.0,2013-01-02,1.0,5,train,foo


In [25]:
Series(1, index=['a', 'b', 'c', 'd'], dtype='float32')

a    1.0
b    1.0
c    1.0
d    1.0
dtype: float32

Observe how the objects in the dictionary are utilized.

The dtypes of the DataFrame can be obtained using the dtypes attribute.

In [26]:
df6.dtypes

U           float64
V    datetime64[ns]
W           float32
X             int32
Y            object
Z            object
dtype: object