# A quick introduction to _`pandas`_ dataframes

### Define the input data

In [1]:
import pandas as pd

planets_dict = {'Planet': ['Mercury','Venus','Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune'],
                'Distance': [36.0, 67.24, 92.9, 141.71, 483.88, 887.14, 1783.98, 2796.46],
                'Diameter': [3032.4, 7519, 7926.2, 4194, 88736, 74978, 32193, 30775]}

### Create a `pandas` dataframe

In [2]:
df = pd.DataFrame.from_dict(planets_dict)

print(df)

    Planet  Distance  Diameter
0  Mercury     36.00    3032.4
1    Venus     67.24    7519.0
2    Earth     92.90    7926.2
3     Mars    141.71    4194.0
4  Jupiter    483.88   88736.0
5   Saturn    887.14   74978.0
6   Uranus   1783.98   32193.0
7  Neptune   2796.46   30775.0


In [3]:
 df.index

RangeIndex(start=0, stop=8, step=1)

In [4]:
df.columns

Index(['Planet', 'Distance', 'Diameter'], dtype='object')

You can change the row index to a specific column.

### Set index

In [5]:
df = df.set_index('Planet')

df.head()

Unnamed: 0_level_0,Distance,Diameter
Planet,Unnamed: 1_level_1,Unnamed: 2_level_1
Mercury,36.0,3032.4
Venus,67.24,7519.0
Earth,92.9,7926.2
Mars,141.71,4194.0
Jupiter,483.88,88736.0


In [6]:
df.index

Index(['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus',
       'Neptune'],
      dtype='object', name='Planet')

### Retrieve values

Select a specific row from the dataframe.

In [7]:
df.loc['Mercury']

Distance      36.0
Diameter    3032.4
Name: Mercury, dtype: float64

This operation returns a _`pandas`_ series. 

In [8]:
type(df.loc['Mercury'])

pandas.core.series.Series

A _`pandas`_ series is a one-dimensional labeled array.

Select a specific value from the series.

In [9]:
df.loc['Mercury']['Diameter']

3032.4

Directly grab this value from the dataframe.

In [10]:
df.loc['Mercury', 'Diameter']

3032.4

The format is: `dataframe-name.loc[row-label, column-label]`.

Grab the diameters of _all_ planets. In other words, grab all values from a single column.

In [11]:
df.loc[:, 'Diameter']

Planet
Mercury     3032.4
Venus       7519.0
Earth       7926.2
Mars        4194.0
Jupiter    88736.0
Saturn     74978.0
Uranus     32193.0
Neptune    30775.0
Name: Diameter, dtype: float64

This returns a _`pandas`_ series. We can also grab the values from this series.

In [12]:
df.loc[:, 'Diameter'].values

array([ 3032.4,  7519. ,  7926.2,  4194. , 88736. , 74978. , 32193. ,
       30775. ])

We can also select multiple columns from the dataframe.

In [13]:
df.loc[:, ['Diameter', 'Distance']]

Unnamed: 0_level_0,Diameter,Distance
Planet,Unnamed: 1_level_1,Unnamed: 2_level_1
Mercury,3032.4,36.0
Venus,7519.0,67.24
Earth,7926.2,92.9
Mars,4194.0,141.71
Jupiter,88736.0,483.88
Saturn,74978.0,887.14
Uranus,32193.0,1783.98
Neptune,30775.0,2796.46


This would retrun a _`pandas`_ dataframe (not a series).

### Iterate through a dataframe

In [14]:
for i, rec in df.iterrows():
    print(f"Planet: {i}, Distance: {rec['Distance']}, Diameter: {rec['Diameter']}")

Planet: Mercury, Distance: 36.0, Diameter: 3032.4
Planet: Venus, Distance: 67.24, Diameter: 7519.0
Planet: Earth, Distance: 92.9, Diameter: 7926.2
Planet: Mars, Distance: 141.71, Diameter: 4194.0
Planet: Jupiter, Distance: 483.88, Diameter: 88736.0
Planet: Saturn, Distance: 887.14, Diameter: 74978.0
Planet: Uranus, Distance: 1783.98, Diameter: 32193.0
Planet: Neptune, Distance: 2796.46, Diameter: 30775.0


### Delete a dataframe

In [15]:
%whos

Variable       Type         Data/Info
-------------------------------------
df             DataFrame             Distance  Diamet<...>ptune   2796.46   30775.0
i              str          Neptune
pd             module       <module 'pandas' from '/h<...>ages/pandas/__init__.py'>
planets_dict   dict         n=3
rec            Series       Distance     2796.46\nDia<...>: Neptune, dtype: float64


In [16]:
del df

In [17]:
%whos

Variable       Type      Data/Info
----------------------------------
i              str       Neptune
pd             module    <module 'pandas' from '/h<...>ages/pandas/__init__.py'>
planets_dict   dict      n=3
rec            Series    Distance     2796.46\nDia<...>: Neptune, dtype: float64


The dataframe doesn't exist any more in this session; this can clear up some memory on your machine if the datafram is large.