# Predict - Python Data Structures

This is the project on Python Data Structures. We will start by transforming the raw data for you. You then need to create 9 functions as specified in the project instructions.

NB: Make sure the football_players.csv file is in the same directory as this notebook

**PROJECT RULES**:

* You may not import any external packages (except for pandas) - all of the functions need to be solved WITHOUT THE USE OF ANY OTHER EXTERNAL MODULES.
* Most importantly: your functions need to return the answer (not just print it out).
* Do not add or remove any cells from this notebook. Use another notebook to experiment in (or in which to do your workings), but your submission may not have any additional cells or functions.
* Only fill in code where the #YOUR CODE tags appear. No code outside these areas (or outside the given functions) will be marked.

## Transform Data

### Import Data

In [1]:
import pandas as pd

# Load data - pass 'Name' as our index column
load_df = pd.read_csv('football_players.csv', index_col='Name').sample(frac=1)

# Create dataframe called df
df = pd.DataFrame(load_df)

  interactivity=interactivity, compiler=compiler, result=result)


### Create Position Type Column

In [2]:
def position_type(s):
    
    """This function converts the individual positions (abbreviations) and classfies it
    as either a forward, midfielder, back or goal keeper"""
    
    if (s[-2] == 'T') | (s[-2] == 'W'):
        return 'Forward'
    elif s[-2] == 'M':
        return 'Midfielder'
    elif s[-2] == 'B':
        return 'Back'
    else:
        return 'GoalKeeper'

# Create position type column
df['Preferred Positions Type'] = df['Preferred Positions'].apply(position_type)

### Transform Attribute Columns to Floats

In [3]:
# Select all attribute columns
cols = ['Overall', 'Acceleration', 'Aggression',
       'Agility', 'Balance', 'Ball control', 'Composure', 'Crossing', 'Curve',
       'Dribbling', 'Finishing', 'Free kick accuracy', 'GK diving',
       'GK handling', 'GK kicking', 'GK positioning', 'GK reflexes',
       'Heading accuracy', 'Interceptions', 'Jumping', 'Long passing',
       'Long shots', 'Marking', 'Penalties', 'Positioning', 'Reactions',
       'Short passing', 'Shot power', 'Sliding tackle', 'Sprint speed',
       'Stamina', 'Standing tackle', 'Strength', 'Vision', 'Volleys']

def to_float(x):    
    "Transforms attribute columns to type float"
    
    if type(x) is int:
        return float(x)
    else:
        return float(x[0:2])

df[cols] = df[cols].applymap(to_float)

## Function 1

Build an algorithm that identifies the nth ranked (rank) defender in the world - sorted by 'Overall' then 'Name' (both descending order)
* Under a certain age (max_age)

In [4]:
### START FUNCTION 1

def best_defender_1(rank, max_age):
    df_age = df[df['Age']<max_age].sort_values(by=['Overall','Name'], ascending= False)
    df_defender = df_age[df_age['Preferred Positions Type']=='Back']
    df_rank = df_defender.head(rank).tail(1)
    
    return df_rank.index[0]
    
    # YOUR CODE HERE
    raise NotImplementedError()

### END FUNCTION 1

In [5]:
best_defender_1(10, 35)

'T. Alderweireld'

## Function 2

Build an algorithm that identifies the nth ranked (rank) defender in the world - sorted by 'Overall' then 'Name' (both descending order)
* Under a certain age (max_age)
* Has an aggression score below a certain level (max_aggression)
* Has a stamina score above a certain level (min_stamina)

In [6]:
### START FUNCTION 2

def best_defender_2(rank, max_age, max_aggression, min_stamina):
    df_age = df[df['Age']<max_age].sort_values(by=['Overall','Name'], ascending= False)
    df_agg = df_age[df_age['Aggression']< max_aggression]
    df_sta = df_agg[df_agg['Stamina']> min_stamina] 
    df_defender = df_sta[df_sta['Preferred Positions Type']=='Back']
    df_rank = df_defender.head(rank).tail(1)
    
    
    return df_rank.index[0]
    # YOUR CODE HERE
    raise NotImplementedError()

### END FUNCTION 2

In [7]:
best_defender_2(10, 30, 80, 60)

'K. Walker'

## Function 3

Build an algorithm that identifies the nth ranked (rank) defender in the world - sorted by 'Overall' then 'Name' (both descending order)
* Under a certain age
* Does not play for a certain team (team)

In [8]:
### START FUNCTION 3

def best_defender_3(rank, max_age, team):
    
    # YOUR CODE HERE
    df_age = df[df['Age']<max_age].sort_values(by=['Overall','Name'], ascending= False)
    df_defender = df_age[df_age['Preferred Positions Type']=='Back']
    df_new = df_defender[df_defender['Nationality']!= team]

    df_rank = df_new.head(rank).tail(1)
    
    return df_rank.index[0]
        
    raise NotImplementedError()

### END FUNCTION 3

In [9]:
best_defender_3(10, 25, 'Argentina')

'A. Rüdiger'

## Function 4

Build an algorithm that identifies the nth ranked (rank) attacker in the world - sorted by 'Overall' then 'Name' (both descending order)
* With specified attribute (attribute_name) above a threshold (min_attribute_score)

In [10]:
### START FUNCTION 4

def best_attacker_1(rank, attribute_name, min_attribute_score):
    
    # YOUR CODE HERE
    df_a = df[df[attribute_name]> min_attribute_score].sort_values(by=['Overall','Name'], ascending= False)
    df_forward = df_a[df_a['Preferred Positions Type']=='Forward']

    df_rank = df_forward.head(rank).tail(1)
    return df_rank.index[0]
    raise NotImplementedError()

### END FUNCTION 4

In [11]:
best_attacker_1(10, 'Balance', 50)

'P. Aubameyang'

## Function 5

Build an algorithm that identifies the nth ranked (rank) attacker in the world - sorted by 'Overall' then 'Name' (both descending order)
* With average of specified attributes (attribute_1_name, attribute_2_name) above a threshold (min_attributes_ave)

In [12]:
### START FUNCTION 5

def best_attacker_2(rank, attribute_1_name, attribute_2_name, min_attributes_ave):
    
    # YOUR CODE HERE
    df_F = df[df['Preferred Positions Type']=='Forward'].sort_values(by=['Overall','Name'],ascending= False)
    df_avg = df_F[((df_F[attribute_1_name]+ df_F[attribute_2_name])/2)> min_attributes_ave]
    df_rank = df_avg.head(rank).tail(1)
    
    return df_rank.index[0]

    raise NotImplementedError()

### END FUNCTION 5

In [13]:
best_attacker_2(10, 'Finishing', 'Balance', 80)

'S. Mané'

## Function 6

Build an algorithm that identifies the nth ranked (rank) attacker in the world - sorted by 'Overall' then 'Name' (both descending order)
* With minimum of specified attributes (attribute_1_name, attribute_2_name) above a threshold (min_attributes_min)

In [36]:
### START FUNCTION 6

def best_attacker_3(rank, attribute_1_name, attribute_2_name, min_attributes_min):
    
    df_F = df[df['Preferred Positions Type']=='Forward'].sort_values(by=['Overall','Name'],ascending= False)
    df_avg = df_F[(df_F[attribute_1_name] > min_attributes_min) & (df_F[attribute_2_name]> min_attributes_min)]
    df_rank = df_avg.head(rank).tail(1)
    
   
    return df_rank.index[0]
    
    # YOUR CODE HERE
    raise NotImplementedError()
    
### END FUNCTION 6

In [37]:
best_attacker_3(10, 'Balance', 'Composure', 70)

'A. Di María'

## Function 7

Build an algorithm that identifies the best n (no_defenders) defenders - sorted by 'Overall' then 'Name' (both descending order)
* From a certain country (country)
* Under a certain age (max_age)

Your function must return a `list` of `strings`

In [38]:
### START FUNCTION 7

def best_team_1(country, no_defenders, max_age):
    
    # YOUR CODE HERE
        df_F = df[(df['Preferred Positions Type']=='Back') & (df['Nationality']== country)].sort_values(by=['Overall','Name'],ascending= [False,False])
        df_Best = df_F[df_F['Age']< max_age].head(no_defenders)
    
        return list(df_Best.index[:])
        raise NotImplementedError()
    
### END FUNCTION 7

In [39]:
best_team_1('England', 3, 30)

['K. Walker', 'N. Clyne', 'E. Dier']

## Function 8

Build an algorithm that identifies the best n (no_attackers) attackers - sorted by 'Overall' then 'Name' (both descending order)
* From a certain country (country)
* With a specified attribute (attribute name) above a threshold (min_attribute_score)

Your function must return a `list` of `strings`

In [19]:
### START FUNCTION 8

def best_team_2(country, no_attackers, attribute_name, min_attribute_score):
    
    # YOUR CODE HERE
    df_F = df[(df['Preferred Positions Type']=='Forward') & (df['Nationality']== country)].sort_values(by=['Overall','Name'],ascending= [False,False])
    df_Best = df_F[df_F[attribute_name]> min_attribute_score].head(no_attackers)
    

    
    return list(df_Best.index[:])
       
    
    raise NotImplementedError()
    
### END FUNCTION 8

In [20]:
best_team_2('England', 3, 'Finishing', 60)

['H. Kane', 'R. Sterling', 'D. Sturridge']

## Function 9

Build an algorithm that identifies the best team based on the team structure (no_attackers, no_defenders, no_midfielders, no_goalkeepers) - sorted by 'Overall' then 'Name' (both descending order)
* From a certain country (country)

Your function must return a `list` of `strings`

In [44]:
### START FUNCTION 9

def best_team_3(country, no_attackers, no_defenders, no_midfielders, no_goalkeepers):


    
    # YOUR CODE HERE
    
    df2 = df[(df['Preferred Positions Type'] == 'Forward') & (df['Nationality'] == country)].sort_values(['Overall', 'Name'], ascending = [False, False]).head(no_attackers)    
    df3 = df[(df['Preferred Positions Type'] == 'Back') & (df['Nationality'] == country)].sort_values(['Overall', 'Name'], ascending = [False, False]).head(no_defenders)
    df4 = df[(df['Preferred Positions Type'] == 'Midfielder') & (df['Nationality'] == country)].sort_values(['Overall', 'Name'], ascending = [False, False]).head(no_midfielders)
    df5 = df[(df['Preferred Positions Type'] == 'GoalKeeper') & (df['Nationality'] == country)].sort_values(['Overall', 'Name'], ascending = [False, False]).head(no_goalkeepers)
    x = df2.append(df3).append(df4).append(df5)
    lst = []
    for row in x.iterrows():
        index, data = row
        lst.append(index)
    return lst
    
    
    raise NotImplementedError()
    
### END FUNCTION 9

In [45]:
best_team_3('England', 3, 4, 3, 1)

['H. Kane',
 'R. Sterling',
 'D. Sturridge',
 'G. Cahill',
 'K. Walker',
 'N. Clyne',
 'L. Baines',
 'D. Alli',
 'A. Lallana',
 'J. Henderson',
 'J. Hart']