# Sigmoid Criteria Curve Fitting: Algebraic Approach
**Contributors:** Justin Kaufman, Marco Scialanga

**Achievement:** Curve fitting with scipy. Curves used: x^2 / sqrt(1+x^2) (algebraic) and the generalized logistic function.

## Imports
**Most important packages:** Pymongo, Scipy, Numpy, Pandas.

In [18]:
import sys
sys.path.append('../..')
import numpy as np
import matplotlib.pyplot as plt
from config import client
from pymongo import MongoClient
import math
from scipy.optimize import curve_fit
from mlpp.data_modeling.sigmoid_fitting import *
import pandas as pd

## Connection with Compass
**Dataset:** osu_random_db.

In [19]:
db = client['osu_random_db']

## Loading the Ids
**Collection:** beatmap_criteria_curve.

In [20]:
cursor = db["beatmap_criteria_curve"].find({},{"_id":1})
l = []
for el in cursor:
    l.append(el)
Ids = list(map(lambda x: x["_id"], l))

## Storing New Data
**Objective:** store information about our curve fits in the collection.

In [21]:
# Run all CDF curve fits & store in beatmap_criteria_curve collection
store_genLog(Ids, db)
store_alg(Ids, db)

NameError: name 'np' is not defined

## Success Rate Table 
**Objective:** create a table to compare success rate of curve fits.

In [7]:
collection = db["osu_beatmaps_attribs_modZero"]
db.attrib_17.insert_many(
    collection.aggregate([{'$match':{"beatmap_id": {'$in': Ids}, "attrib_id": 17}}]))

a = fit_lowDiff(db)
b = fit_mediumDiff(db)
c = fit_highDiff(db)
d = fit_all(db)

db["attrib_17"].drop()

dfSuccess = pd.DataFrame()
dfSuccess.insert(0, "Function", ["Generalized Logistic", "Algebraic"])
dfSuccess.insert(1, "SuccessRateLowDiff", [a[0], a[1]])
dfSuccess.insert(2, "SuccessRateMediumDiff", [b[0], b[1]])
dfSuccess.insert(3, "SuccessRateHighDiff", [c[0], c[1]])
dfSuccess.insert(4, "OverallSuccessRate", [d[0], d[1]])
dfSuccess

NameError: name 'db' is not defined

## Mean Squared Error Table 
**Objective:** create a table to compare success mean square error of curve fits.

In [9]:
dfMse = pd.DataFrame()
dfMse.insert(0, "Function", ["Generalized Logistic", "Algebraic"])
dfMse.insert(1, "AverageMseLowDiff", [a[2], a[3]])
dfMse.insert(2, "AverageMseMediumDiff", [b[2], b[3]])
dfMse.insert(3, "AverageMseHighDiff", [c[2], c[3]])
dfMse.insert(4, "OverallAverageMse", [d[2], d[3]])
dfMse

NameError: name 'pd' is not defined

## Conclusion
The algebraic (alg) function has a much higher success rate than the generalized logistic (genLog), probably because of the smaller number of parameters. Thus, the slight advantage of the genLog over the algebraic regarding the mean squared error does not justify using genLog instead of alg. However, the nature of the data does not help us to decide which model would work better with ideal distributions.