# Sigmoid Criteria Curve Fitting: Algebraic Approach
**Contributors:** Justin Kaufman, Marco Scialanga

**Achievement:** Curve fitting with scipy. Curves used: $f = \frac{x^2}{\sqrt{1+x^2}}$ (what we called "algebraic"), $f = A + \frac{K-A}{(1+Qe^{(-Bx)})^{1/v}}$ (what we called "generalized logistic").

## Imports
**Most important packages:** Pymongo, Scipy, Numpy, Pandas.

In [11]:
import sys
sys.path.append('../..')
import numpy as np
import matplotlib.pyplot as plt
from config import client
from pymongo import MongoClient
import math
from scipy.optimize import curve_fit
from winter21.mlpp.data_modeling.sigmoid_fitting import *
import pandas as pd

## Connection with Compass
**Dataset:** osu_random_db.

In [7]:
db = client['osu_random_db']

## Loading the Ids
**Collection:** beatmap_criteria_curve.

In [8]:
cursor = db["beatmap_criteria_curve"].find({},{"_id":1})
l = []
for el in cursor:
    l.append(el)
Ids = list(map(lambda x: x["_id"], l))

## Storing New Data
**Objective:** store information about our curve fits in the collection.

In [7]:
# Run all CDF curve fits & store in beatmap_criteria_curve collection
store_genLog(Ids, db)
store_alg(Ids, db)

  y_temp = np.asarray(beatmap['no_mod']['n_pass'])/np.asarray(beatmap['no_mod']['total'])
  f = A + (K-A)/(1+Q*np.exp(-B*x)**(1/v))
  f = A + (K-A)/(1+Q*np.exp(-B*x)**(1/v))


KeyboardInterrupt: 

## Success Rate Table 
**Objective:** create a table to compare success rate of curve fits.

In [7]:
collection = db["osu_beatmaps_attribs_modZero"]
db.attrib_17.insert_many(
    collection.aggregate([{'$match':{"beatmap_id": {'$in': Ids}, "attrib_id": 17}}]))

a = fit_lowDiff(db)
b = fit_mediumDiff(db)
c = fit_highDiff(db)
d = fit_all(db)

db["attrib_17"].drop()

dfSuccess = pd.DataFrame()
dfSuccess.insert(0, "Function", ["Generalized Logistic", "Algebraic"])
dfSuccess.insert(1, "SuccessRateLowDiff", [a[0], a[1]])
dfSuccess.insert(2, "SuccessRateMediumDiff", [b[0], b[1]])
dfSuccess.insert(3, "SuccessRateHighDiff", [c[0], c[1]])
dfSuccess.insert(4, "OverallSuccessRate", [d[0], d[1]])
dfSuccess

NameError: name 'db' is not defined

## Mean Squared Error Table 
**Objective:** create a table to compare success mean square error of curve fits.

In [9]:
dfMse = pd.DataFrame()
dfMse.insert(0, "Function", ["Generalized Logistic", "Algebraic"])
dfMse.insert(1, "AverageMseLowDiff", [a[2], a[3]])
dfMse.insert(2, "AverageMseMediumDiff", [b[2], b[3]])
dfMse.insert(3, "AverageMseHighDiff", [c[2], c[3]])
dfMse.insert(4, "OverallAverageMse", [d[2], d[3]])
dfMse

NameError: name 'pd' is not defined

## Example

In [12]:
x = get_x_and_y(397536, db)[0]
y = get_x_and_y(397536, db)[1]
popt = curve_fit(algFunc, x, y, maxfev = 1000)
plot_fit_alg(popt, 397536, db, x, y)
plot_fit_genLog(popt, 397536)

NameError: name 'plt' is not defined

## Conclusion
The algebraic (alg) function has a much higher success rate than the generalized logistic (genLog), probably because of the smaller number of parameters. Thus, the slight advantage of the genLog over the algebraic regarding the mean squared error does not justify using genLog instead of alg. However, the nature of the data does not help us to decide which model would work better with ideal distributions.