#Time Maps

###Alternate visualization for event occurance rates.


In [9]:
import pandas as pd
import numpy as np
from twython import Twython
import matplotlib.pyplot as plt

In [None]:
import scipy.ndimage as ndi

Nside=256 # this is the number of bins along x and y for the histogram
width=8 # the width of the Gaussian function along x and y when applying the blur operation

H = np.zeros((Nside,Nside)) # a 'histogram' matrix that counts the number of points in each grid-square

max_diff = np.max(diffs) # maximum time difference

x_heat = (Nside-1)*xcoords/max_diff # the xy coordinates scaled to the size of the matrix
y_heat = (Nside-1)*ycoords/max_diff # subtract 1 since Python starts counting at 0, unlike Fortran and R

for i in range(len(xcoords)): # loop over all points to calculate the population of each bin
    H[x_heat[i], y_heat[i]] += 1 # Increase count by 1
    #here, the integer part of x/y_heat[i] is automatically taken

H = ndi.gaussian_filter(H,width) # apply Gaussian blur
H = np.transpose(H) # so that the orientation is the same as the scatter plot

plt.imshow(H, origin='lower') # display H as an image
plt.show()

###Get Tweets

Twitter API allows you to gather the 3,200 most recent tweets written by a user. Using Twython, I downloaded tweets from the @FEMA. 

In [8]:
def get_tweets(user):
    twitter = Twython()
    user_timeline = twitter.get_user_timeline(screen_name=user, count = 200, include_rts = 1)


    while len(user_timeline) != 0: 
        try:
            user_timeline = twitter.get_user_timeline(screen_name='eugenebann',
                                                      include_rts = 1,
                                                      count=200,
                                                      max_id=user_timeline[len(user_timeline)-1]['id']-1)
        except TwythonError as e:
            print e
        print len(user_timeline)
        for tweet in user_timeline:
            # Add whatever you want from the tweet, here we just add the text
            tweets.append(tweet['text'])
            
            
    user_timeline = pd.DataFrame(user_timeline, columns = ['time', 'Tweet'])
    return(user_timeline)

tweet_data = get_tweets("FEMA")
tweet_data.head()

TwythonAuthError: Twitter API returned a 400 (Bad Request), Bad Authentication data.

####Plotting the Data

In [None]:
def make_histogram(data, time_column):
    
    # Create bins
    min_date = min(data[time_column]).date
    max_date = max(data[time_column]).date
    bins = pd.date_range(min_date, max_date, freq='D')
    
    # Find number of values in Bins
    bin_values = []
    for bin_date in bins:
        value = np.sum(data[time_column] < bin_date & bin_date < (data[time_column] + datetime.timedelta(days=1)))
        bin_values.append(value)
    
    positions = np.arange(len(bins))
    
    # Plot Objects
    plt.bar(left = positions, height = bin_values, width = 0,          
    plt.xticks(positions, bins)
    plt.ylabel('Count of Tweets')
    plt.title('Tweets by Day')
    plt.show()                
    
    

For timeries of events with many short bursts such in our case, a typical time series histogram does not give a lot of information. Although we can see peaks at various times we cannot see what is going on during those "bursts". To do this we would have to subset the data for this periods and make our bins smaller.

###The Alternative: A Time Map
I read about this idea in a recent blog post at District Data Labs:
https://districtdatalabs.silvrback.com/time-maps-visualizing-discrete-events-across-many-timescales 

The author described a technique for visualizing many events across multiple timescales in a single image. It allows the viewer to quickly identify critical features, such as whether events occur on a timescale of milliseconds or months.It also allows the viewer to see if the event is preceded or proceeded by a period of sparse event occurances. It is adopted from the field of chaotic systems, and was originally conceived to study the timing of water drops from a dripping faucet. 

The graphs shows the time from the previous event to an event as well and the time until the next event. The result is the scatter plot below. 

<img src="quadrants_N_medium.png">

##Making a Time Map

###Transforming the Data

In [None]:
def find_intervals(data):
    data = data.sort("time", ascending = 1)    
    data.time_before = data.time - data.time.shift(1)
    data.time_after = data.time.shift(-1) - data.time

tweet_data = find_intervals(tweet_data)

In [None]:
plt.scatter(y = tweet_data.time_after, x = tweet_data.time_before, c = 'red', alpha = .5)
plt.show

Now we can see the distinction between the data and it's outliers. The bursts show major events happening while the upper values of the (x, y) axis show really slow periods. By subsetting the bursts of data we can extract the "major events"