# Ex.3

## Exploring the base table
`Before diving into model building, it is important to understand the data you are working with. In this exercise, you will learn how to obtain the population size, number of targets and target incidence from a given basetable.`

## Instructions
- The basetable is loaded in a pandas object basetable. Assign the number of rows to the variable population_size and print it.
- Assign the number of targets equal to one to the variable targets_count and print it.
- Print the target incidence, this is the ratio of targets_count and population_size.

In [None]:
# Assign the number of rows in the basetable to the variable 'population_size'.
population_size  = len(basetable)

# Print the population size.
print(population_size)

# Assign the number of targets to the variable 'targets_count'.
targets_count = sum(basetable['target'])

# Print the number of targets.
print(targets_count)

# Print the target incidence.
print(targets_count / len(basetable))

# Ex.4

## Exploring the predictive variables
It is always useful to get a better understanding of the population. Therefore, one can have a closer look at the predictive variables. Recall that you can select a column in a pandas dataframe by indexing as follows:

`basetable["variable"]`
- To count the number of occurrences of a certain value in a column, you can use the sum method:

`sum(basetable["variable"]==value)`
- In this exercise you will find out whether there are more males than females in the population.

In [None]:
# Count and print the number of females.
print(sum(basetable['gender'] == 'F'))

# Count and print the number of males.
print(sum(basetable['gender'] == 'M'))

# Ex. 7

## Building a logistic regression model
`You can build a logistic regression model using the module linear_model from sklearn. First, you create a logistic regression model using the LogisticRegression() method:`

- logreg = linear_model.LogisticRegression()
- Next, you need to feed data to the logistic regression model, so that it can be fit. X contains the predictive variables, whereas y has the target.

`X = basetable[["predictor_1","predictor_2","predictor_3"]]`
`y = basetable[["target"]]`
`logreg.fit(X,y)`
- In this exercise you will build your first predictive model using three predictors.

## Instructions
- Import the methodlinear_model from sklearn.
- The basetable is loaded as basetable. Note that the column "gender" has been transformed to gender_F so that it can be used as a predictor. Construct a dataframe X that contains the predictors age, gender_F and time_since_last_gift.
- Construct a dataframe y that contains the target.
- Create a logistic regression model.
- Fit the logistic regression model on the given basetable.

In [None]:
# Import linear_model from sklearn.
from sklearn import linear_model

# Create a dataframe X that only contains the candidate predictors age, gender_F and time_since_last_gift.
X = basetable[['age', 'gender_F', 'time_since_last_gift']]

# Create a dataframe y that contains the target.
y = basetable[['target']]

# Create a logistic regression model logreg and fit it to the data.
logreg = linear_model.LogisticRegression()
logreg.fit(X, y)

# Ex. 8

## Showing the coefficients and intercept
`Once the logistic regression model is ready, it can be interesting to have a look at the coefficients to check whether the model makes sense.`

`Given a fitted logistic regression model logreg, you can retrieve the coefficients using the attribute coef_. The order in which the coefficients appear, is the same as the order in which the variables were fed to the model. The intercept can be retrieved using the attribute intercept_.`

`The logistic regression model that you built in the previous exercises has been added and fitted for you in logreg.`

## Instructions

- Assign the coefficients of the logistic regression model to the list coef.
- Assign the intercept of the logistic regression model to the variable intercept.

In [None]:
# Construct a logistic regression model that predicts the target using age, gender_F and time_since_last gift
predictors = ["age","gender_F","time_since_last_gift"]
X = basetable[predictors]
y = basetable[["target"]]
logreg = linear_model.LogisticRegression()
logreg.fit(X, y)

# Assign the coefficients to a list coef
coef = logreg.coef_
for p,c in zip(predictors,list(coef[0])):
    print(p + '\t' + str(c))
    
# Assign the intercept to the variable intercept
intercept = logreg.intercept_
print(intercept)

# Ex.10

## Making predictions
>Once your model is ready, you can use it to make predictions for a campaign. It is important to always use the latest information to make predictions.

>In this exercise you will, given a fitted logistic regression model, learn how to make predictions for a new, updated basetable.

>The logistic regression model that you built in the previous exercises has been added and fitted for you in logreg.

## Instructions
>The latest data is in current_data. Create a data frame new_data that selects the relevant columns from current_data.
>Assign to predictions the predictions for the observations in new_data.

In [None]:
# Fit a logistic regression model
from sklearn import linear_model
X = basetable[["age","gender_F","time_since_last_gift"]]
y = basetable[["target"]]
logreg = linear_model.LogisticRegression()
logreg.fit(X, y)

# Create a dataframe new_data from current_data that has only the relevant predictors 
new_data = current_data[["age", "gender_F", "time_since_last_gift"]]

# Make a prediction for each observation in new_data and assign it to predictions
predictions = logreg.predict_proba(new_data)
print(predictions[0:5])

# Ex.11

## Donor that is most likely to donate
`The predictions that result from the predictive model reflect how likely it is that someone is a target. For instance, assume that you constructed a model to predict whether a donor will donate more than 50 Euro for a certain campaign. If the prediction for a certain donor is 0.82, it means that there is an 82% chance that he will donate more than 50 Euro.`

`In this exercise you will find the donor that is most likely to donate more than 50 Euro.`

`Recall that you can sort a pandas dataframe df according to a certain column c using`

> df_sorted = df.sort(["c"])

`and that you can select the first and last row of a pandas dataframe using`

`first_row = df.head(1)
last_row = df.tail(1)`

## Instructions
- The predictions are in a pandas dataframe predictions that has two columns: the donor ID and the probability to be target. Sort these predictions such that the donors with lowest probability to donate are first.

- Select and print the row in this sorted dataframe that has the donor that is most likely to donate more than 50 Euro according to the model.

In [None]:
# Sort the predictions
predictions_sorted = ____.____([____])

# Print the row of predictions_sorted that has the donor that is most likely to donate
print(____.____(____))