1. I would like to solve a problem in image recognition, particularly whether an image contains a certain object. I have not decided on the exact object yet, as I would like to leave my options open for choosing a dataset. Logistic Regression would be a good way to solve this problem, as you can feed the algorithm the individual RBG color value, and have it output the binary choice of if the image contains said object or not.

2. I chose to use a dataset that contains data about images of oranges and grapefruits. 

[Link: https://www.kaggle.com/joshmcadams/oranges-vs-grapefruit ]

This dataset has 10,000 points, equally distributed between the two groups, and contains data about their diameter and weight. What is nice about this dataset is that it has also already extracted all the RGB color values from the picture, and simply lists them as their average. The red values tend to be near 150, the green near 75, and the blue near 0, which is around what one would expect of both fruits. Additionally, there are no missing or invalid data points, nor does there seem to be any significant outliers. I sorted each of the values for each type of fruit, and none of the maximum or minimum values seemed statistically improbable enough to remove. Overall, the dataset is essentially ready-to-go when downloaded, which is why I"m glad I kept my options open in problem #1.

In [98]:
# Import TensorFlow, Pandas, and Numpy
import tensorflow as tf
import pandas as pd
import numpy as np
import math

#read in data and print first few values
data = pd.read_csv('citrus.csv')
data.head()

#initialize weights and b
weights = np.array([0,0,0,0,0])
b_val = 0


def evaluateSigmoid(value):
  return (1/(1 + pow(math.e, -1*value)))

#initialize data
X = data[['diameter', 'weight', 'red', 'green', 'blue']]
Y = []
for name in data['name']:
  if name=='orange':
    Y.append(1)
  else:
    Y.append(0)
Y = np.array(Y)

#3.)
#Variation 1: Vanilla Gradient Descent
num_epochs = 350;
learning_rate = 0.01;
for epoch in range(num_epochs):
  A = evaluateSigmoid(np.dot(weights,X.T) + b_val)
  #I had to change the 1 in the second portion to 1.000001 in case A = 1
  cost=-1/len(Y) * np.sum(Y * np.log(A) + (1-Y) * (np.log(1.000001-A)))
  dw = np.dot(X.T, (A-Y).T)/len(Y)
  db= np.sum(A-Y)/len(Y)
  weights = weights - learning_rate*dw
  b_val = b_val - learning_rate*db
  print('Batch Gradient Descent Epoch: '+str(epoch)+'   Cost: '+str(cost))

#Variation 2: Stochastic Gradient Descent
num_epochs = 1 
for epoch in range(num_epochs):
  for i in range(len(Y)):
    A = evaluateSigmoid(np.dot(weights,X[i:i+1].T) + b_val)
    dw = np.dot(X[i:i+1].T, (A-Y[i:i+1]).T)/len(Y)
    db= np.sum(A-Y[i:i+1])/len(Y)
    weights = weights - learning_rate*dw
    b_val = b_val - learning_rate*db
  A = evaluateSigmoid(np.dot(weights,X.T) + b_val)
  #I had to change the 1 in the second portion to 1.000001 in case A = 1
  cost=-1/len(Y) * np.sum(Y * np.log(A) + (1-Y) * (np.log(1.000001-A)))
  print('Stochastic Gradient Descent Epoch: '+str(epoch)+'   Cost: '+str(cost))

#4.)
#Optimization 1: Momentum
num_epochs = 150;
momentum_w = np.array([0,0,0,0,0]);
momentum_b=0;
momentum_const=0.9
for epoch in range(num_epochs):
  A = evaluateSigmoid(np.dot(weights,X.T) + b_val)
  #I had to change the 1 in the second portion to 1.000001 in case A = 1
  cost=-1/len(Y) * np.sum(Y * np.log(A) + (1-Y) * (np.log(1.000001-A)))
  dw = np.dot(X.T, (A-Y).T)/len(Y)
  db= np.sum(A-Y)/len(Y)
  momentum_w = momentum_const*momentum_w + learning_rate*dw
  momentum_b = momentum_const*momentum_b + learning_rate*db
  weights = weights - momentum_w
  b_val = b_val - momentum_b
  print('Momentum Gradient Descent Epoch: '+str(epoch)+'   Cost: '+str(cost))

#Optimization 2: Nesterov
num_epochs = 150;
momentum_w = np.array([0,0,0,0,0]);
momentum_b=0;
for epoch in range(num_epochs):
  A = evaluateSigmoid(np.dot((weights-momentum_const*momentum_w),X.T) + b_val-momentum_const*momentum_b)
  #I had to change the 1 in the second portion to 1.000001 in case A = 1
  cost=-1/len(Y) * np.sum(Y * np.log(A) + (1-Y) * (np.log(1.000001-A)))
  dw = np.dot(X.T, (A-Y).T)/len(Y)
  db= np.sum(A-Y)/len(Y)
  momentum_w = momentum_const*momentum_w + learning_rate*dw
  momentum_b = momentum_const*momentum_b + learning_rate*db
  weights = weights - momentum_w
  b_val = b_val - momentum_b
  print('Nesterov Gradient Descent Epoch: '+str(epoch)+'   Cost: '+str(cost))


Batch Gradient Descent Epoch: 0   Cost: 0.6931461805609453
Batch Gradient Descent Epoch: 1   Cost: 6.209987755714926
Batch Gradient Descent Epoch: 2   Cost: 6.907755279023269
Batch Gradient Descent Epoch: 3   Cost: 18.634564701547788
Batch Gradient Descent Epoch: 4   Cost: 6.907755279023269
Batch Gradient Descent Epoch: 5   Cost: 31.054503788242314
Batch Gradient Descent Epoch: 6   Cost: 6.907755279023269
Batch Gradient Descent Epoch: 7   Cost: 43.47444284233938
Batch Gradient Descent Epoch: 8   Cost: 6.907755279023269
Batch Gradient Descent Epoch: 9   Cost: 55.89438189643386
Batch Gradient Descent Epoch: 10   Cost: 6.907755279023269
Batch Gradient Descent Epoch: 11   Cost: 68.31432095052838
Batch Gradient Descent Epoch: 12   Cost: 6.907755279023269
Batch Gradient Descent Epoch: 13   Cost: 80.73426000462287
Batch Gradient Descent Epoch: 14   Cost: 6.9037799435697575
Batch Gradient Descent Epoch: 15   Cost: 93.12326372749679
Batch Gradient Descent Epoch: 16   Cost: 5.766489708347084
Bat



Momentum Gradient Descent Epoch: 68   Cost: 2.3043074009683546
Momentum Gradient Descent Epoch: 69   Cost: 2.2545663550493225
Momentum Gradient Descent Epoch: 70   Cost: 2.583654798466653
Momentum Gradient Descent Epoch: 71   Cost: 3.410204130381188
Momentum Gradient Descent Epoch: 72   Cost: 4.109124924348271
Momentum Gradient Descent Epoch: 73   Cost: 3.829638153924724
Momentum Gradient Descent Epoch: 74   Cost: 3.0190860313167027
Momentum Gradient Descent Epoch: 75   Cost: 2.463006021299585
Momentum Gradient Descent Epoch: 76   Cost: 2.3450178879579306
Momentum Gradient Descent Epoch: 77   Cost: 2.5507307632411114
Momentum Gradient Descent Epoch: 78   Cost: 3.0812967601866093
Momentum Gradient Descent Epoch: 79   Cost: 3.5533500329952363
Momentum Gradient Descent Epoch: 80   Cost: 3.4911249013272596
Momentum Gradient Descent Epoch: 81   Cost: 2.9914685997638126
Momentum Gradient Descent Epoch: 82   Cost: 2.5351715510206856
Momentum Gradient Descent Epoch: 83   Cost: 2.37864632893637

4. Overall, the momentum gradient descent performs better. They both converge to their minimum values at around the same time, but the Momentum Gradient Descent has a much smaller final cost. Momentum also happens to have an overflow error at some point, although this does not impact the final result. However, when compared to Vanilla Gradient Descent, they are both much faster overall. Optimization algorithms are useful for this dataset, but it is also small enough that it is not a significant time difference. 