# Analyzing Melville's journal entries
Looking at simple statistics about Melville's travels.

In [17]:
import pandas as pd
import numpy as np
from collections import Counter

In [18]:
og = pd.read_csv("data/melville.csv")

In [19]:
mv = og.dropna(subset=['Latitude', 'Longitude', 'Waypoint', 'Day'])

In [20]:
mv

Unnamed: 0,Latitude,Longitude,Year,Month,Day,Day_Name,Display_Date,Day/Route,Alt_Route,City,Waypoint,Type,Observed,Text,NN Notes,Editorial
0,51.5079242,-0.096461,1849.0,11.0,6.0,Tuesday,"November 6, 1849",1.0,,London,London Bridge,w,,"Arrived at noon, ""crossed over at once into th...",,Current London Bridge built in 1970s
1,51.5135254,-0.0926984,1849.0,11.0,6.0,Tuesday,"November 6, 1849",1.0,,London,Poultry St.,w,,,"From London Bridge Station, via King William S...",
3,51.5138486,-0.1005339,1849.0,11.0,6.0,Tuesday,"November 6, 1849",1.0,,London,St. Paul's,o,,"""Passed St. Paul's to the Strand""",,
5,51.5071619,-0.126291,1849.0,11.0,6.0,Tuesday,"November 6, 1849",1.0,,London,25 Craven St.,w,,"""Secured rooms at last. . . . Very cheap.""",,
6,"51.5157635,",-0.099289,1849.0,11.0,6.0,Tuesday,"November 6, 1849",1.0,,London,"Queen's Hotel, St. Martin's Le Grand",w,,"""Went down to Queen's Hotel to inquire about o...","In St. Martin's Le Grand, near St. Paul's, a l...",Coordinates for St. Martin's Le Grand
7,51.5143533,-0.1008036,1849.0,11.0,6.0,Tuesday,"November 6, 1849",1.0,,London,Paternoster Row,l,,,North of St.Paul's,
9,51.5128569,-0.122548,1849.0,11.0,6.0,Tuesday,"November 6, 1849",1.0,,London,Drury Lane Theatre,w,,"""Went to Drury Lane""","Catherine St east of Covent Garden, 2m walk fr...",
14,51.5071619,-0.126291,1849.0,11.0,6.0,Tuesday,"November 6, 1849",1.0,,London,25 Craven St.,w,,"""a good wash and turned in early""",,
15,51.5071619,-0.126291,1849.0,11.0,7.0,Wednesday,"November 7, 1849",2.0,,London,25 Craven St.,w,,,,
19,51.5089025,-0.0804522,1849.0,11.0,7.0,Wednesday,"November 7, 1849",2.0,,London,Custom House,l,"""two disconsolates on the ship""","""infinite trouble with the cursed Customs""",Just west of the Tower,


## What do we wanna do?

Create an interactive map of all of Melville's travels and show where he is most likely to go to next given any starting location.

### but wut do we neeeed

Well, given any location, we want to be able to show you the probability that Melville will go to any of the other locations.

group by day so that you dont get overlap


we can use Counter objects for this to bascially keep counter dictionaries? Export it as JSON so that it can be easily used in D3.

So we want:

`
{
    "starting location 1": {
        "other location 1": count,
        "other location 2": count,
        ...
    },
    "starting location 2": {
        "other location 1": count,
        "other location 2": count,
        ...
    },
}
`

In [10]:
# aggregation example
aggregation = {
    "duration": {
        "total_dur": "sum",
        "avergage_dur": "mean",
        "num_calls": "count"
    },
    "date": {
        "max_date": "max",
        "min_date": "min",
        "num_data": lambda x: max(x) - min(x)
    },
    "nextwork": ["count", "max"]
}

# data[data['item'] == 'call'].groupby('month').agg(aggregation) 

In [11]:
# get all of the unique values
locs = list(mv.Waypoint.dropna().unique())

# get rid of unknown locations
locs = [l for l in locs if "?" not in l]

# make labelling more consistent
locs = [l.strip().replace(".","") for l in locs]

# convert all street to st
locs = [l.replace("Street","St") for l in locs]
location_counts = {}
for l in locs:
    location_counts[l] = {}

location_counts

{"1 St James's Place": {},
 '1 Upper Belgrave St': {},
 '101 Strand': {},
 '13  Paternoster Row': {},
 '13 Cork St': {},
 '13 Paternoster Row': {},
 '16 Little Britain': {},
 '186 Strand': {},
 '2 Bryanston Square': {},
 "22 St James's Place": {},
 '25 Craven St': {},
 '27 Bow St': {},
 '27 Upper Harley St': {},
 '32 Strand': {},
 '331 Strand': {},
 "331 Strand, Stibbs' Bookshop": {},
 '332 Strand': {},
 "35-43 Lincoln's Inn Fields": {},
 '37 Gordon Square': {},
 '37 Paternoster Row': {},
 '39 Paternoster Row': {},
 '4 Trafalgar Sq': {},
 '4 Trafalgar Sqare': {},
 '4 Trafalgar Square': {},
 "40 Lincoln's Inn Fields": {},
 '44 Dover St': {},
 '46 Portland Place': {},
 '50 Albemarle St': {},
 '8 Bishopsgate': {},
 '8 New Burlington St': {},
 '83-86 Aldgate St': {},
 '86 Fleet St': {},
 "9 King's Bench Walk, Inner Temple": {},
 'Adelphi Theatre, Strand': {},
 'Angel Inn, St John St': {},
 'Blackfriars': {},
 'Blackfriars Bridge': {},
 'Blackfriars Lane': {},
 'Blackfriars Road': {},
 'Bor

In [15]:
# second attempt: keys are the coordinates
"""
coord = {
"(54,0.2)": {
    "name": jeojgor,
    "text": grigrhighig,
    "other locations": {
        "(52,0048.44)": {
            "name": ghighr,
            "count": 40
        }
    }
}
}
"""

'\ncoord = {\n"(54,0.2)": {\n    "name": jeojgor,\n    "text": grigrhighig,\n    "other locations": {\n        "(52,0048.44)": {\n            "name": ghighr,\n            "count": 40\n        }\n    }\n}\n}\n'

In [51]:
#     for i in range(len(group_locs)-1):
        
#         curr_loc = group_locs[i].lstrip().rstrip()
#         next_loc = group_locs[i+1].lstrip().rstrip()
        
#         print("{} -> {}\n".format(curr_loc, next_loc))
        
#         if next_loc not in location_counts[curr_loc]: 
#             location_counts[curr_loc][next_loc] = 1
#         else:
#             location_counts[curr_loc][next_loc] += 1

toExport = {}
day_grouped = mv.groupby("Day")
for name, group in day_grouped:
    

#     group_locs = list(group.Waypoint.dropna())
#     group_locs = [l for l in group_locs if "?" not in l]
#     group_locs = [l.strip().replace(".","") for l in group_locs]    
#     group_locs = [l.replace("Street","St") for l in group_locs]    

    coords = list(zip(group.Latitude,group.Longitude))
    str_coords = [str(c) for c in coords]
    grouped = list(zip(str_coords,group.Waypoint,group.Text, coords))
    
    for i in range(len(grouped)-1):
        
        cur_g = grouped[i]
        next_g = grouped[i+1]
        print("{} -> {}".format(cur_g[1], next_g[1]))

        if cur_g[1] not in toExport:
            toExport[cur_g[1]] = {
                "latitude": cur_g[3][0],
                "longitude": cur_g[3][1],
                "next_locs": {}
            }
        else:
            print("\n")
            if next_g[1] not in toExport[cur_g[1]]["next_locs"]:
                toExport[cur_g[1]]["next_locs"][next_g[1]] = {"latitude": next_g[3][0],
                                                              "longitude": next_g[3][1],
                                                              "count": 1 }
            else:
                toExport[cur_g[1]]["next_locs"][next_g[1]]["count"] +=1

London Bridge -> Poultry St. 
Poultry St.  -> St. Paul's
St. Paul's -> 25 Craven St.
25 Craven St. -> Queen's Hotel, St. Martin's Le Grand
Queen's Hotel, St. Martin's Le Grand -> Paternoster Row
Paternoster Row -> Drury Lane Theatre
Drury Lane Theatre -> 25 Craven St.
25 Craven St. -> Custom House


Custom House -> 25 Craven St.
25 Craven St. -> 25 Craven St.


25 Craven St. -> 8 New Burlington St.


8 New Burlington St. -> 50 Albemarle St.
50 Albemarle St. -> National Gallery
National Gallery -> Chancery Lane
Chancery Lane -> Lincoln's Inn Fields
Lincoln's Inn Fields -> Holborn
Holborn -> 25 Craven Street
25 Craven Street -> Cheapside
Cheapside -> London Bridge?
London Bridge? -> St. James's Park
St. James's Park -> Green Park
Green Park -> Hyde Park Corner
Hyde Park Corner -> Hyde Park
Hyde Park -> Edgware Road and Marble Arch
Edgware Road and Marble Arch -> Adelphi Theatre, Strand
Adelphi Theatre, Strand -> 25 Craven St.
25 Craven St. -> Lincoln's Inn 


Lincoln's Inn  -> Gray's Inn

In [49]:
toExport

{" 22 St. James's Place": {'latitude': '51.5057579',
  'longitude': '-0.1427858',
  'next_locs': {}},
 "1 St. James's Place": {'latitude': '51.506122',
  'longitude': '-0.139319',
  'next_locs': {}},
 '1 Upper Belgrave St': {'latitude': '51.5003362',
  'longitude': '-0.1561976',
  'next_locs': {}},
 '101 Strand': {'latitude': '51.5114196',
  'longitude': '-0.1205727',
  'next_locs': {}},
 '13  Paternoster Row': {'latitude': '51.5148561',
  'longitude': '-0.1002868',
  'next_locs': {'Lombard St.': {'count': 1,
    'latitude': '51.5126124',
    'longitude': '-0.0893248'},
   'Tavistock St, Covent Garden': {'count': 1,
    'latitude': '51.5121019',
    'longitude': '-0.122703'}}},
 '13 Cork St': {'latitude': '51.5099915',
  'longitude': '-0.1435292',
  'next_locs': {}},
 '13 Cork St.': {'latitude': '51.5099915',
  'longitude': '-0.1435292',
  'next_locs': {}},
 '13 Paternoster Row': {'latitude': '51.5148528',
  'longitude': '-0.1002922',
  'next_locs': {'13 Paternoster Row': {'count': 1,


In [50]:
import json

with open('result.json', 'w') as fp:
    json.dump(toExport, fp)