# Pandas

DataFrames are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data.

In [1]:
import pandas
pandas.__version__

'1.0.0'

In [3]:
import pandas as pd
import numpy as np

## Pandas Series Object

A Pandas `Series` is a one-dimensional array of indexed data.
You can think of it as a specialized python dict.

In [12]:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

In [7]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

In [10]:
data.count()

4

In [11]:
data.index

RangeIndex(start=0, stop=4, step=1)

In [13]:
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

Construct a series from a python dict

In [14]:
population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135}
population = pd.Series(population_dict)
population

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64

## Pandas DataFrame Object

### DataFrame as a generalized NumPy array
If a ``Series`` is an analog of a one-dimensional array with flexible indices, a ``DataFrame`` is an analog of a two-dimensional array with both flexible row indices and flexible column names.

In [15]:
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
area

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
dtype: int64

In [16]:
states = pd.DataFrame({'population': population,
                       'area': area})
states

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312
Illinois,12882135,149995


In [17]:
states['population']

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
Name: population, dtype: int64

### Constructing DataFrame objects

A Pandas ``DataFrame`` can be constructed in a variety of ways.
Here we'll give several examples.

#### From a single Series object

A ``DataFrame`` is a collection of ``Series`` objects, and a single-column ``DataFrame`` can be constructed from a single ``Series``:

In [None]:
pd.DataFrame(population, columns=['population'])

#### From a list of dicts

#### From a dictionary of Series objects

#### From a two-dimensional NumPy array

#### From a NumPy structured array