## Hypothesis testing 

Aim: to find ways to generate more revenue for taxi drivers

Assumption: Sample data comes from 2 groups of customers where they are required to either pay by cash or credit card 
Without this assumption, casusal conclusions cannot be drawn about how payment method affects fare amount 

### Import packages 

In [2]:
import pandas as pd
from scipy import stats

### Import data 

In [3]:
taxi_data = pd.read_csv("2017_Yellow_Taxi_Trip_Data.csv", index_col = 0)

### Relationship between payment type & fare amount

In [4]:
taxi_data.groupby('payment_type')['fare_amount'].mean()

payment_type
1    13.429748
2    12.213546
3    12.186116
4     9.913043
Name: fare_amount, dtype: float64

Based on the results above, customers who paid using credit card tend to pay a larger fare amount than customers who paid in cash. 

### Hypothesis testing 

To investigate whether the difference between fare amount is statistically significant

Null hypothesis: There is no difference in the average fare amount between customers who use credit cards and customers who use cash.

Alternative hypothesis: There is a difference in the average fare amount between customers who use credit cards and customers who use cash.

In [7]:
#hypothesis test, A/B test
significance_level =0.05

credit_card = taxi_data[taxi_data['payment_type'] == 1]['fare_amount']
cash = taxi_data[taxi_data['payment_type'] == 2]['fare_amount']
stats.ttest_ind(a=credit_card, b=cash, equal_var=False)

TtestResult(statistic=6.866800855655372, pvalue=6.797387473030518e-12, df=16675.48547403633)

Since the p-value is significantly smaller than the significance level of 5%, reject the null hypothesis.

### Conclusion 

There is a statistically significant difference in the average fare amount between customers using credit cards and cash 

### Business insights from results 

Encouraging customers to pay with credit cards can generate more revenue for taxi drivers 
However, the results of the hypothesis testing doesnt not account for other likely explanations that cannot be measured through this testing, such as customer preferred payment type due to convenience or having insufficient cash 