# Logarithm Relationship and Curve fitting in Python
## Example: Efficiency Curve: Marketing Cost vs Conversions


### Overview

---

In this notebook, I will demonstrate how you can fit a logarithmic curve to data points and extrapolate the coefficient using numpy and scipy.
Logarithm curve can be widely used in many contexts - for example, it indicates the correlation between marekting spend and conversions. Most of the time, as the spend increases, the amount of conversion increases. However, the level of increase diminishes and the curve starts to flatten at one point. The point should more or less indicate the ideal spend level as the number of sign-ups increase will be saturated at this point.
The fitted curve line is a logarithmic relation of the 2 metrics. 






### Technical Details

---
This notebook utilize Scipy.Optmize library and use a log function to fit the efficiency curve


In [2]:
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
from numpy import linspace, log, sin,pi
from matplotlib import pyplot as plt
from random import random
from datetime import datetime, date
import plotly.graph_objs as go
import plotly.express as px #graphing


In [206]:
df = pd.read_csv('data.csv')

In [207]:
df['year'] = df['year']-1

In [208]:
df['month'] = pd.to_datetime(df['month']).dt.strftime("%Y-%m")
df['year'] = pd.to_datetime(df['month']).dt.strftime("%Y")

In [209]:
df['year']= df['year'].astype(str)

In [210]:
df = df[df['month']!= '2022-12']

## Demonstration
Generating random data

In [179]:
import numpy as np
def func(x, a, b, c):
    return a * np.log(b * x) + c

In [91]:
x = np.linspace(1,5,100)   # changed boundary conditions to avoid division by 0
y = func(x, 2.7, 1.3, 0.5)
yn = y + 0.3*np.random.normal(size=len(x))

array([ -16.54166107,   56.88946933,  117.64130609,    7.02734444,
        -25.71155829,  -47.19971448,   30.40884964,  -24.42085705,
        100.14787773,   57.38226066,   35.06121057,  -72.40355811,
        -10.47933133,   30.50151113,  134.65906419, -293.05012716,
          9.52084766,  -23.31967029,  -37.22857417,   19.60991521,
        185.60576438, -112.3152503 ,  111.79729587,  122.43230444,
        -29.69801159, -174.4099987 ,  -71.15402769,  105.66545764,
          9.06263988, -107.7320558 ,  -60.50723033, -108.96763179,
        -30.41311942,  118.48892448,  -46.75972598,  -51.50198593])

In [92]:
popt, pcov = curve_fit(func, x, y)

print(popt)
y_fit = np.array(func(x, *popt))

[2.7        0.54531238 2.84565402]


In [97]:
fig = go.Figure(data=go.Scatter(x=x, y=yn, mode='markers', name= 'data'))
fig.add_trace(go.Scatter(
    x=x, y=func(x, *popt),
    name='Fitted Curve'
))
fig.show()

## Efficiency Curve

In [180]:
def func(x, a, b, c):
  #return a * np.exp(-b * x) + c
  return a * np.log(b * x) + c

In [181]:
# Getting r squared value:
# https://stackoverflow.com/questions/19189362/getting-the-r-squared-value-using-curve-fit
def calculate_r_squared(x, y, popt):
    residuals = y - func(x, *popt)
    ss_res = np.sum(residuals**2) #  residulas sum of squares
    ss_tot = np.sum((y-np.mean(y))**2) #total sum of squares
    r_squared = 1 - (ss_res / ss_tot)
    return r_squared

In [182]:
df.head()
df['noise']=pd.Series(100*np.random.normal(size=len(y)))

In [211]:
df['noise']=pd.Series(200*np.random.normal(size=len(y)))
df['conversions_']=df['conversions']+df['noise']* df['multiplier']
x = np.array(df['cost'])
y = np.array(df['conversions_'])
x = np.array(df['cost'])
y = np.array(df['conversions'])

x.sort()
y.sort()

popt, pcov = curve_fit(func, x, y)
print(*popt)
y_fit = np.array(func(x, *popt))

fig = px.scatter(df, x="cost", y="conversions_", title = f"Logarithmic Growth", height=600, width=750, 
                 template = 'plotly_white', color = "year")
fig.update_traces(marker=dict(size=10))
fig.update_layout(xaxis_title="Cost", yaxis_title="Conversions",font=dict(family="Avenir",))
fig.add_trace(go.Scatter(x=x, y=func(x, *popt), mode='lines', name = 'Growth Curve', marker=dict(
                color="lightslategrey",
                size=5
            )))
#fig.update(layout_showlegend=False)

fig.show()

600.6441906726494 1.1325083237041516 -3334.349119926422



invalid value encountered in log

