# Introduction

This is a practical hands-on lab to demonstrate an example Data Analysis Process. With this lab you will perform an Exploratory Data Analysis while using the Data Analysis Process workflow.

## Objective(s)

At the end of this lab, you should be able to:

* iterate through the Data Analysis Process effectively.
* perform EDA for any organization using the Data Analysis Process to achieve objectives.

## The Data Analysis Process

### What is the Data Analysis Process?
Simply put, the data analysis process can be defined as a collection of steps required to make sense of the available data. It summaries the different but essential tasks that a data analyst should deal with in order to deliver realiable output for his users.

The Data Analysis Process includes the following stages:
* Identify Objectives
* Collect Data
* Process (prepare) Data
* Perform Analysis
* Share outcome (result)

## Sample Task

You work as a Data Analyst with AZ Global, a scouting firm that recommends basketball players to NBA clubs. Your upline supervisor has asked you to identify potential players that can be recommended to clubs for the upcoming season using historical stats data. 

### Stage One: Identifying Objectives

At this stage, a data analyst will ask the right questions in order to be clear about the task at hand.
It is important that you know how to ask the right questions and also have good listening skills. Sometimes, you will 
have to ask these questions to yourself based on the task sheet before you. If this is the case, be sure to follow up on others to clarify any area that was not obvious to you.

Remember that asking the right question and clearly identifying business objectives will save you time and resources while also making sure that the result of your analysis is relevant to your client or organization.

From our sample task above, what questions are we likely to ask?
* What position is the target club seeking to recruit?
* What are the key qualities they might be interested in?
* Are there any particular interest regarding previous team history?
* Any particular interest regarding age, nationality, height and weight?

Once we come up with this questions the next thing is to pass it to the supervisor to work on.


### Feedback

The supervisor has sent back a file containg answers to the questions you raised. Below are certain clarifications you must consider while carrying out your analysis.
1. They want a player with a minimum of sixty(60) games played.
2. They want a player with an average of 1800 minutes of games played.
3. 


Deliveriables:

The club expect a recommendation for 3 players that should be  the following:
- a player that has a minimum of 500 FGM.
- has an average height of 
- has played in a top 5 team.
- has played international or regular season.
- have a minimum of 1000 points.
- a good record of BLK.

## Collect Data

The dataset has been given to us as a csv file and now resides in our working directory.

## Data Processing (Prepare and Clean Data)

Experienced Analyzt agree this stage is where the real work lies. It is very important that your dataset is well processed and clean before your begin any major analysis on it. Refer to [Data Analysis Process - Class](https://github.com/xtian4zy/Data-Analysis-Process---Class) for more. 

So, in this step we will be fixing up our data by identifying any missing data points, removing incorrect data formats, duplicates, or removing unwanted data.

##### Let's begin...
First is to import required libraries

In [1]:
import pandas as pd # for manipulating dataframe
import matplotlib.pyplot as plt # for visualization
import seaborn as sns # for visualization
pd.set_option('display.max_columns', None)

### Let's load in our dataset

In [2]:
df = pd.read_csv('nba_players_full_details.csv')

The next thing to do once our data is in is to preview it for a quick opinion.

In [3]:
df.head(10) # Let's see the first ten (10) records

Unnamed: 0,League,Season,Stage,Player,Team,GP,MIN,FGM,FGA,3PM,3PA,FTM,FTA,TOV,PF,ORB,DRB,REB,AST,STL,BLK,PTS,birth_year,birth_month,birth_date,height,height_cm,weight,weight_kg,nationality,high_school,draft_round,draft_pick,draft_team
0,NBA,1999 - 2000,Regular_Season,Shaquille O'Neal,LAL,79,3163.0,956,1665,0,1,432,824,223,255,336,742,1078,299,36,239,2344,1972.0,Mar,"Mar 6, 1972",7-1,216.0,325.0,147.0,United States,Robert G. Cole High School,1.0,1.0,Orlando Magic
1,NBA,1999 - 2000,Regular_Season,Vince Carter,TOR,82,3126.0,788,1696,95,236,436,551,178,263,150,326,476,322,110,92,2107,1977.0,Jan,"Jan 26, 1977",6-6,198.0,220.0,100.0,United States,Mainland High School,1.0,5.0,Golden State Warriors
2,NBA,1999 - 2000,Regular_Season,Karl Malone,UTA,82,2947.0,752,1476,2,8,589,739,231,229,169,610,779,304,79,71,2095,1963.0,Jul,"Jul 24, 1963",6-9,206.0,265.0,120.0,United States,Summerfield High School,1.0,13.0,Utah Jazz
3,NBA,1999 - 2000,Regular_Season,Allen Iverson,PHI,70,2853.0,729,1733,89,261,442,620,230,162,71,196,267,328,144,5,1989,1975.0,Jun,"Jun 7, 1975",6-0,183.0,165.0,75.0,United States,Bethel High School,1.0,1.0,Philadelphia Sixers
4,NBA,1999 - 2000,Regular_Season,Gary Payton,SEA,82,3425.0,747,1666,177,520,311,423,224,178,100,429,529,732,153,18,1982,1968.0,Jul,"Jul 23, 1968",6-4,193.0,180.0,82.0,United States,Skyline High School,1.0,2.0,Seattle SuperSonics
5,NBA,1999 - 2000,Regular_Season,Jerry Stackhouse,DET,82,3148.0,619,1447,83,288,618,758,311,188,118,197,315,365,103,36,1939,1974.0,Nov,"Nov 5, 1974",6-6,198.0,218.0,99.0,United States,Oak Hill Academy,1.0,3.0,Philadelphia Sixers
6,NBA,1999 - 2000,Regular_Season,Grant Hill,DET,74,2776.0,696,1422,34,98,480,604,240,190,97,393,490,385,103,43,1906,1972.0,Oct,"Oct 5, 1972",6-8,203.0,225.0,102.0,United States,South Lakes High School,1.0,3.0,Detroit Pistons
7,NBA,1999 - 2000,Regular_Season,Kevin Garnett,MIN,81,3243.0,759,1526,30,81,309,404,268,205,223,733,956,401,120,126,1857,1976.0,May,"May 19, 1976",6-11,211.0,240.0,109.0,United States,Farragut Career Academy,1.0,5.0,Minnesota Timberwolves
8,NBA,1999 - 2000,Regular_Season,Michael Finley,DAL,82,3464.0,748,1636,99,247,260,317,196,171,122,396,518,438,109,32,1855,1973.0,Mar,"Mar 6, 1973",6-7,201.0,225.0,102.0,United States,Proviso East High School,1.0,21.0,Phoenix Suns
9,NBA,1999 - 2000,Regular_Season,Chris Webber,SAC,75,2880.0,748,1548,27,95,311,414,218,264,189,598,787,345,120,128,1834,1973.0,Mar,"Mar 1, 1973",6-9,206.0,245.0,111.0,United States,Detroit Country Day School,1.0,1.0,Orlando Magic


Looking there seems to be nothing obvious, especially **nan** values. So, let's not see the bottom 5 records.

In [4]:
df.tail() # By default returns the last 5 records.

Unnamed: 0,League,Season,Stage,Player,Team,GP,MIN,FGM,FGA,3PM,3PA,FTM,FTA,TOV,PF,ORB,DRB,REB,AST,STL,BLK,PTS,birth_year,birth_month,birth_date,height,height_cm,weight,weight_kg,nationality,high_school,draft_round,draft_pick,draft_team
53944,Ukrainian-Superleague,2019 - 2020,International,Kyrylo Meshheryakov,MYK,15,127.0,7,28,2,13,3,4,5,27,4,14,18,8,1,3,19,1995.0,Aug,"Aug 17, 1995",6-6,198.0,182.0,83.0,Ukraine,,,,
53945,Ukrainian-Superleague,2019 - 2020,International,Yaroslav Kadygrob,ODE,10,81.7,5,16,4,14,1,3,3,2,2,4,6,3,0,0,15,1991.0,Oct,"Oct 28, 1991",6-3,191.0,187.0,85.0,Ukraine,,,,
53946,Ukrainian-Superleague,2019 - 2020,International,Ernesto Tkachuk,ODE,16,124.7,1,15,0,11,7,14,11,12,0,15,15,10,6,1,9,1994.0,Sep,"Sep 17, 1994",6-2,188.0,200.0,91.0,Ukraine,,,,
53947,Ukrainian-Superleague,2019 - 2020,International,Andrij Shapovalov,KHAR,12,59.2,0,8,0,7,3,6,5,6,1,4,5,3,1,0,3,1993.0,Nov,"Nov 10, 1993",6-2,188.0,171.0,78.0,Ukraine,,,,
53948,Ukrainian-Superleague,2019 - 2020,International,Dmitriy Lypovtsev,KHAR,5,86.3,1,13,0,9,0,0,7,11,2,12,14,4,2,1,2,1986.0,Oct,"Oct 10, 1986",6-8,203.0,220.0,100.0,Ukraine,,,,


Oops, there is going to be some work to do as we see **nan** values in some of the columns of the data set.

Get ready for some cleaning job. Next, let's find out what the column names are for data.

In [5]:
df.columns

Index(['League', 'Season', 'Stage', 'Player', 'Team', 'GP', 'MIN', 'FGM',
       'FGA', '3PM', '3PA', 'FTM', 'FTA', 'TOV', 'PF', 'ORB', 'DRB', 'REB',
       'AST', 'STL', 'BLK', 'PTS', 'birth_year', 'birth_month', 'birth_date',
       'height', 'height_cm', 'weight', 'weight_kg', 'nationality',
       'high_school', 'draft_round', 'draft_pick', 'draft_team'],
      dtype='object')

In [6]:
print(len(df.columns))

34
