Google colab link: 

https://colab.research.google.com/drive/1lxbVNog01bLoqUL3AmQApbxGy8BJm4QV#scrollTo=VCHmkJhOiuMC

In [0]:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import os
import pandas as pd
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# **This notebook compares quicksort, mergesort, and heapsort. **

**These algorithms are tested as they sort a dataframe of person data (first name, last name, and state abbreviation).**

In [0]:
import pandas as pd
import numpy as np
import time
import string
import random

In [0]:
# array of state abbreviations
states = ["SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY",
          "AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA",
          "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC",
          "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ",
          "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD"]

In [0]:
# create data frame for person data
df1 = pd.DataFrame({'state': states}) 
df1['first'] = 'x'
df1['last'] = 'y'

In [0]:
# generate data for persons
def id_generator(size=10, chars=string.ascii_lowercase):
    return ''.join(random.choice(chars) for _ in range(size))

for i in range(50):
    df1['first'].loc[i] = id_generator()
    df1['last'].loc[i] = id_generator()

df1 = df1.reset_index()

In [14]:
# check data frame
df1.head(5)

Unnamed: 0,index,state,first,last
0,0,SD,vjxaxtgqcb,erderkmkai
1,1,TN,milybrjyuk,goekqqbeia
2,2,TX,yjjzocsdrl,gztvzbfqrv
3,3,UT,eryadrmqfx,egxwwzoxra
4,4,VT,puwogkcmsm,wchjalrnrh


## **Quicksort function**

In [0]:
# quicksort alphabetical array
def quicksort_alpha(list):
    if not list:
        return []
    return (quicksort_alpha([x for x in list[1:] if x < list[0]])
            + [list[0]] +
            quicksort_alpha([x for x in list[1:] if x >= list[0]]))

### **Create arrays for first name, last name, and state**

In [0]:
unsorted_first_names = df1['first'].tolist()
unsorted_last_names = df1['last'].tolist()
unsorted_states = df1['state'].tolist()

### **Quicksort  each array using my quicksort function and record the execution time**

In [0]:
# quicksort first name array
first_quicksort_start = time.clock()
quicksorted_first = quicksort_alpha(unsorted_first_names)
first_quicksort_stop = time.clock()
first_quicksort_time = first_quicksort_stop - first_quicksort_start

# quicksort last name array
last_quicksort_start = time.clock()
quicksorted_last = quicksort_alpha(unsorted_last_names)
last_quicksort_stop = time.clock()
last_quicksort_time = last_quicksort_stop - last_quicksort_start

# quicksort state array
state_quicksort_start = time.clock()
quicksorted_state = quicksort_alpha(unsorted_states)
state_quicksort_stop = time.clock()
state_quicksort_time = state_quicksort_stop - state_quicksort_start

### **Quicksort each array using the numpy wuicksort function and record the exectuion time **

In [0]:
# quicksort first name array
first_sort_start = time.clock()
sorted_first = sorted(unsorted_first_names)
first_sort_stop = time.clock()
first_sort_time = first_sort_stop - first_sort_start

# quicksort last name array
last_sort_start = time.clock()
sorted_last = sorted(unsorted_last_names)
last_sort_stop = time.clock()
last_sort_time = last_sort_stop - last_sort_start

# quicksort state array
state_sort_start = time.clock()
sorted_state = sorted(unsorted_states)
state_sort_stop = time.clock()
state_sort_time = state_sort_stop - state_sort_start

### **Mergesort each array using the numpy mergesort function**

In [0]:
# mergesort first name array
first_mergesort_start = time.clock()
merge_sorted_first = np.sort(unsorted_first_names, kind='mergesort')
first_mergesort_stop = time.clock()
first_mergesort_time = first_mergesort_stop - first_mergesort_start

# mergesort last name array
last_mergesort_start = time.clock()
mergesorted_last = np.sort(unsorted_last_names, kind='mergesort')
last_mergesort_stop = time.clock()
last_mergesort_time = last_mergesort_stop - last_mergesort_start

# mergesort state array
state_mergesort_start = time.clock()
mergesorted_state = np.sort(unsorted_states, kind='mergesort')
state_mergesort_stop = time.clock()
state_mergesort_time = state_mergesort_stop - state_mergesort_start

### **Heapsort each array using the numpy heapsort function**

In [0]:
# sort first name array
first_heapsort_start = time.clock()
heapsorted_first = np.sort(unsorted_first_names, kind='heapsort')
first_heapsort_stop = time.clock()
first_heapsort_time = first_heapsort_stop - first_heapsort_start

# sort last name array
last_heapsort_start = time.clock()
heapsorted_last = np.sort(unsorted_last_names, kind='heapsort')
last_heapsort_stop = time.clock()
last_heapsort_time = last_heapsort_stop - last_heapsort_start

# sort state array
state_heapsort_start = time.clock()
heapsorted_state = np.sort(unsorted_states, kind='heapsort')
state_heapsort_stop = time.clock()
state_heapsort_time = state_heapsort_stop - state_heapsort_start


## **Sort the person dataframe using quicksort, mergesort, and heapsort**

In [0]:
# quicksort df on 'first name'
df_quicksort_start = time.clock()
quicksort_df = df1.sort_values(['first'], kind='quicksort')
df_quicksort_stop = time.clock()
df_quicksort_time = df_quicksort_stop - df_quicksort_start

# mergesort df on 'first name'
df_mergesort_start = time.clock()
mergesort_df = df1.sort_values(['first'], kind='mergesort')
df_mergesort_stop = time.clock()
df_mergesort_time = df_mergesort_stop - df_mergesort_start

# heapsort df on 'first name'
df_heapsort_start = time.clock()
heapsort_df = df1.sort_values(['first'], kind='heapsort')
df_heapsort_stop = time.clock()
df_heapsort_time = df_heapsort_stop - df_heapsort_start

# heapsort df on 'first name'
df_sort_start = time.clock()
sort_df = df1.sort_values(['first'])
df_sort_stop = time.clock()
df_sort_time = df_sort_stop - df_sort_start

### **Create a dataframe to compare the execution times**

In [25]:
# create dataframe comparing array sort times
sort_methods = ['py_quicksort','my_quicksort','mergesort','heapsort']
first_name_array = [first_sort_time,first_quicksort_time,first_mergesort_time,first_heapsort_time]
last_name_array = [last_sort_time,last_quicksort_time,last_mergesort_time,last_heapsort_time]
state_array = [state_sort_time,state_quicksort_time, state_mergesort_time,state_heapsort_time]
df1_unsorted_array = [df_sort_time,df_quicksort_time,df_mergesort_time,df_heapsort_time]

time_table = pd.DataFrame(
        {'a_sort_methods':sort_methods,
         'first_name':first_name_array,
         'last_name':last_name_array,
         'state_array':state_array,
         'dataframe':df1_unsorted_array
         })

time_table

Unnamed: 0,a_sort_methods,dataframe,first_name,last_name,state_array
0,py_quicksort,0.000942,8.6e-05,6.5e-05,9e-05
1,my_quicksort,0.002943,0.0002,0.000181,0.00026
2,mergesort,0.001154,0.001566,0.000261,0.000168
3,heapsort,0.00087,0.00043,0.000258,0.000269


# **Sorted data frame**

In [27]:
quicksort_df

Unnamed: 0,index,state,first,last
6,6,WA,bfmgcxfgdn,ewzbtqclaf
28,28,RI,dkavfeuspm,zlbvzfxepe
7,7,WV,dmxhzjlhgm,gdllxuolfm
5,5,VA,duhqcrdnyh,rlbnxfrfwn
30,30,MA,dxyoslhlsd,ottcanfumb
3,3,UT,eryadrmqfx,egxwwzoxra
12,12,AZ,gaxrcowtsj,vmvrudkhxm
48,48,ME,gfsnmatgbu,akajsqpjmh
9,9,WY,gllobrsmhf,hjoqfjtakm
18,18,FL,gsxmjqmfhi,vqajalkfgn


# **Executive Summary**

The three methods sorted the data frame and arrays fairly quickly. When sorting the data frame of person data, heapsort performed slightly better than numpy’s quicksort function. As heapsort’s worst case runtime is O(n log n) and quicksort’s worst case is O(n^2), heapsort has the potential to outperform quicksort. When sorting the arrays, numpy’s quicksort algorithm performed the best for each, while my quicksort function and numpy’s mergesort function were close seconds.  