# Dictionaries and DataFrames

## Dictionaries

* Dictionaries are (sort of) a generalized version of arrays. Instead of `Index,Value` pairs of arrays, dictionares ues `Key:Value` pairs.
* They can be created via a comma-separated list of `Key:Value` pairs within curly braces `{}`
* Dictionaries are at the heart of a lot of what goes on in Python "under-the-hood"

In [1]:
numbers = {'one':1, 'two':2, 'three':3}

numbers

{'one': 1, 'two': 2, 'three': 3}

#### Access a `value` via the `key`

In [6]:
numbers['two']

2

#### New items can be added to the dictionary using indexing

In [11]:
numbers['ninety'] = 90

numbers

{'one': 1, 'two': 2, 'three': 3, 'ninety': 90}

----



In [12]:
numbers.keys()

dict_keys(['one', 'two', 'three', 'ninety'])

In [13]:
numbers.values()

dict_values([1, 2, 3, 90])

In [10]:
for key,value in numbers.items():
    print (key, value)

one 1
two 2
three 3


---

# The `pandas` package - Python Data Analysis Library - `DataFrame`

In [14]:
import pandas as pd
import numpy as np

In [15]:
my_star_name = np.array(['Sirius','Canopus','Rigil_Kentaurus','Arcturus','Vega','Capella','Rigel'])
my_star_dist = np.array([8.6,74,4.3,34,25,41,1400])
my_star_appmag = np.array([-1.46,-0.72,-0.27,-0.04,0.03,0.08,0.12])

In [16]:
my_star_name,my_star_dist,my_star_appmag

(array(['Sirius', 'Canopus', 'Rigil_Kentaurus', 'Arcturus', 'Vega',
        'Capella', 'Rigel'], dtype='<U15'),
 array([   8.6,   74. ,    4.3,   34. ,   25. ,   41. , 1400. ]),
 array([-1.46, -0.72, -0.27, -0.04,  0.03,  0.08,  0.12]))

In [19]:
star_table = pd.DataFrame(
    {'Name': my_star_name,
     'Distance': my_star_dist,
     'AppMag': my_star_appmag
}
)

In [20]:
star_table

Unnamed: 0,Name,Distance,AppMag
0,Sirius,8.6,-1.46
1,Canopus,74.0,-0.72
2,Rigil_Kentaurus,4.3,-0.27
3,Arcturus,34.0,-0.04
4,Vega,25.0,0.03
5,Capella,41.0,0.08
6,Rigel,1400.0,0.12


### Notice that each row has an `index` assigned to it.

In [21]:
print(star_table)

              Name  Distance  AppMag
0           Sirius       8.6   -1.46
1          Canopus      74.0   -0.72
2  Rigil_Kentaurus       4.3   -0.27
3         Arcturus      34.0   -0.04
4             Vega      25.0    0.03
5          Capella      41.0    0.08
6            Rigel    1400.0    0.12


In [22]:
star_table.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 3 columns):
Name        7 non-null object
Distance    7 non-null float64
AppMag      7 non-null float64
dtypes: float64(2), object(1)
memory usage: 296.0+ bytes


In [23]:
star_table.describe()

Unnamed: 0,Distance,AppMag
count,7.0,7.0
mean,226.7,-0.322857
std,517.893203,0.579733
min,4.3,-1.46
25%,16.8,-0.495
50%,34.0,-0.04
75%,57.5,0.055
max,1400.0,0.12


In [24]:
star_table.count()

Name        7
Distance    7
AppMag      7
dtype: int64

In [25]:
star_table.min()

Name        Arcturus
Distance         4.3
AppMag         -1.46
dtype: object

##### `.min(), .max(), .mean(), .std(), .count()`

### Pieces


* `head()`
* `tail()`
* `loc[row, column]`

In [26]:
star_table

Unnamed: 0,Name,Distance,AppMag
0,Sirius,8.6,-1.46
1,Canopus,74.0,-0.72
2,Rigil_Kentaurus,4.3,-0.27
3,Arcturus,34.0,-0.04
4,Vega,25.0,0.03
5,Capella,41.0,0.08
6,Rigel,1400.0,0.12


In [27]:
star_table.head(2)

Unnamed: 0,Name,Distance,AppMag
0,Sirius,8.6,-1.46
1,Canopus,74.0,-0.72


In [30]:
star_table.tail(2)

Unnamed: 0,Name,Distance,AppMag
5,Capella,41.0,0.08
6,Rigel,1400.0,0.12


In [34]:
star_table.loc[2:3, :]

Unnamed: 0,Name,Distance,AppMag
2,Rigil_Kentaurus,4.3,-0.27
3,Arcturus,34.0,-0.04


In [35]:
star_table.loc[2:3, ['Name', 'AppMag']]

Unnamed: 0,Name,AppMag
2,Rigil_Kentaurus,-0.27
3,Arcturus,-0.04


In [36]:
star_table.loc[:, 'Distance']

0       8.6
1      74.0
2       4.3
3      34.0
4      25.0
5      41.0
6    1400.0
Name: Distance, dtype: float64

In [37]:
star_table['Distance']

0       8.6
1      74.0
2       4.3
3      34.0
4      25.0
5      41.0
6    1400.0
Name: Distance, dtype: float64

## Sorting (`.sort_values`)

In [38]:
star_table.sort_values(['Name'])

Unnamed: 0,Name,Distance,AppMag
3,Arcturus,34.0,-0.04
1,Canopus,74.0,-0.72
5,Capella,41.0,0.08
6,Rigel,1400.0,0.12
2,Rigil_Kentaurus,4.3,-0.27
0,Sirius,8.6,-1.46
4,Vega,25.0,0.03


In [39]:
star_table.sort_values(['Distance'])

Unnamed: 0,Name,Distance,AppMag
2,Rigil_Kentaurus,4.3,-0.27
0,Sirius,8.6,-1.46
4,Vega,25.0,0.03
3,Arcturus,34.0,-0.04
5,Capella,41.0,0.08
1,Canopus,74.0,-0.72
6,Rigel,1400.0,0.12


In [42]:
star_table.sort_values(
    ['Distance'],
    ascending=False
)

Unnamed: 0,Name,Distance,AppMag
6,Rigel,1400.0,0.12
1,Canopus,74.0,-0.72
5,Capella,41.0,0.08
3,Arcturus,34.0,-0.04
4,Vega,25.0,0.03
0,Sirius,8.6,-1.46
2,Rigil_Kentaurus,4.3,-0.27


#### The original table is unchanged

In [43]:
star_table

Unnamed: 0,Name,Distance,AppMag
0,Sirius,8.6,-1.46
1,Canopus,74.0,-0.72
2,Rigil_Kentaurus,4.3,-0.27
3,Arcturus,34.0,-0.04
4,Vega,25.0,0.03
5,Capella,41.0,0.08
6,Rigel,1400.0,0.12


In [46]:
star_table.sort_values(
    ['Distance'],
    ascending=False,
    inplace=True
)

#### The original table is changed

In [47]:
star_table

Unnamed: 0,Name,Distance,AppMag
6,Rigel,1400.0,0.12
1,Canopus,74.0,-0.72
5,Capella,41.0,0.08
3,Arcturus,34.0,-0.04
4,Vega,25.0,0.03
0,Sirius,8.6,-1.46
2,Rigil_Kentaurus,4.3,-0.27


#### Notice that the row-index has **NOT** been reordered!

In [48]:
star_table.loc[2:4, :]

Unnamed: 0,Name,Distance,AppMag


## Resetting the index (`.reset_index`)

In [49]:
star_table.reset_index(drop=True, inplace=True)

In [50]:
star_table

Unnamed: 0,Name,Distance,AppMag
0,Rigel,1400.0,0.12
1,Canopus,74.0,-0.72
2,Capella,41.0,0.08
3,Arcturus,34.0,-0.04
4,Vega,25.0,0.03
5,Sirius,8.6,-1.46
6,Rigil_Kentaurus,4.3,-0.27


In [51]:
star_table.loc[2:4, :]

Unnamed: 0,Name,Distance,AppMag
2,Capella,41.0,0.08
3,Arcturus,34.0,-0.04
4,Vega,25.0,0.03


## Picking out data (`.query`)

In [52]:
star_table.query("Distance < 35")

Unnamed: 0,Name,Distance,AppMag
3,Arcturus,34.0,-0.04
4,Vega,25.0,0.03
5,Sirius,8.6,-1.46
6,Rigil_Kentaurus,4.3,-0.27


In [53]:
star_table.query("Distance < 35").count()

Name        4
Distance    4
AppMag      4
dtype: int64

In [54]:
star_table.query("Distance < 35")['Name'].count()

4

In [55]:
star_table.query("Distance < 35 and AppMag < 0")

Unnamed: 0,Name,Distance,AppMag
3,Arcturus,34.0,-0.04
5,Sirius,8.6,-1.46
6,Rigil_Kentaurus,4.3,-0.27


## Methods

* The `pandas` package has a huge number of different ways to explore, manipulate, and extract data from a `DataFrame`.
* The [`DataFrame` reference page](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)
lists all of the different methods that can be used on a `DataFrame`.
* These methods can be chained together to explore a `DataFrame`

----

#### Simple example: AppMag / 2 for all stars with Distance < 35

In [56]:
star_table

Unnamed: 0,Name,Distance,AppMag
0,Rigel,1400.0,0.12
1,Canopus,74.0,-0.72
2,Capella,41.0,0.08
3,Arcturus,34.0,-0.04
4,Vega,25.0,0.03
5,Sirius,8.6,-1.46
6,Rigil_Kentaurus,4.3,-0.27


In [57]:
star_table.query("Distance < 35")['AppMag'].div(2)

3   -0.020
4    0.015
5   -0.730
6   -0.135
Name: AppMag, dtype: float64

#### More complicated example: Find the name of the star in `star_table` with a distance closest to 30 l.y.

In [58]:
star_table

Unnamed: 0,Name,Distance,AppMag
0,Rigel,1400.0,0.12
1,Canopus,74.0,-0.72
2,Capella,41.0,0.08
3,Arcturus,34.0,-0.04
4,Vega,25.0,0.03
5,Sirius,8.6,-1.46
6,Rigil_Kentaurus,4.3,-0.27


In [59]:
my_distance_value = 30

#### We will just work with the `Distance` column

In [60]:
star_table.loc[:, 'Distance']

0    1400.0
1      74.0
2      41.0
3      34.0
4      25.0
5       8.6
6       4.3
Name: Distance, dtype: float64

#### `.sub()` subtracts a value

In [61]:
star_table.loc[:, 'Distance'].sub(my_distance_value)

0    1370.0
1      44.0
2      11.0
3       4.0
4      -5.0
5     -21.4
6     -25.7
Name: Distance, dtype: float64

#### `.abs()` absolute value

In [62]:
star_table.loc[:, 'Distance'].sub(my_distance_value).abs()

0    1370.0
1      44.0
2      11.0
3       4.0
4       5.0
5      21.4
6      25.7
Name: Distance, dtype: float64

#### `.idxmin()` the row-index of the minimum value

In [63]:
star_table.loc[:, 'Distance'].sub(my_distance_value).abs().idxmin()

3

In [64]:
star_table.loc[3,:]

Name        Arcturus
Distance          34
AppMag         -0.04
Name: 3, dtype: object

In [65]:
star_table['Name'][3]

'Arcturus'

In [66]:
my_min_index = (
    star_table
    .loc[:, 'Distance']
    .sub(my_distance_value)
    .abs()
    .idxmin()
)

star_table['Name'][my_min_index]

'Arcturus'

## Saving a table `.to_csv()`

In [67]:
star_table.to_csv('./Data/NewStarTable.csv', index=False)