---   

<h1 align="center">Introduction to Data Analyst and Data Science for beginners</h1>
<h1 align="center">Lecture no 2.09-01(Pandas-00)</h1>

---
<h3><div align="right">Ehtisham Sadiq</div></h3>    

# _Python_Pandas_Introduction.ipynb_

<img align="center" width="700" height="700"  src="images/pandas-apps.png"  >

>-  **A Pandas Dataframe is a 2-dimensional labeled data structure (like SQL table) with heterogeneously typed columns, having both a row and a column index.**
>-  **In short Pandas is a Software Libarary in Computer Programming and it is written for the Python Programming Language its work to do `data analysis and manipulation.`**

## So, what is Pandas and how is it used in AI?

Artificial Intelligence is about executing machine learning algorithms on products that we use every day. Any ML algorithm, for it to be effective, needs the following prerequisite steps to be done.
- `Data Collection` – Conducting opinion Surveys, scraping the internet, etc.
- `Data Handling` – Viewing data as a table, performing cleaning activities like checking for spellings, removal of blanks and wrong cases, removal of invalid values from data, etc.
- `Data Visualization` – plotting appealing graphs, so anyone who looks at the data can know what story the data tells us.
- `Pandas` – short for `Panel Data` (A panel is a 3D container of data) – is a library in python which contains in-built functions to clean, transform, manipulate, visualize and analyze data.

## Key Features of Pandas
<img src="images/Python-Pandas-Features.webp" height=600px width=600px>


- It has a fast and efficient DataFrame object with the default and customized indexing.
- Used for reshaping and pivoting of the data sets.
- Group by data for aggregations and transformations.
- It is used for data alignment and integration of the missing data.
- Provide the functionality of Time Series.
- Process a variety of data sets in different formats like matrix data, tabular heterogeneous, time series.
- Handle multiple operations of the data sets such as subsetting, slicing, filtering, groupBy, re-ordering, and re-shaping.
- It integrates with the other libraries such as SciPy, and scikit-learn.
- Provides fast performance, and If you want to speed it, even more, you can use the Cython.

## Data Types
A data type is used by a programming language to understand how to store and manipulate data.
- `int` : Integer number, eg: 10, 12
- `float` : Floating point number, eg: 100.2, 3.1415
- `bool` : True/False value
- `object` : Test, non-numeric, or a combination of text and non-numeric values, eg: Apple
- `DateTime` : Date and time values
- `category` : A finite list of values

## What does Pandas deal with?
There are two major categories of data that you can come across while doing data analysis.
- One dimensional data
- Two-dimensional data

These data can be of any data type. Character, number or even an object.

> **Series in Pandas is one-dimensional data, and data frames are 2-dimensional data. A series can hold only a single data type, whereas a data frame is meant to contain more than one data type.**

![](images/dataframe.webp)

**In the example shown above, `Name` is a `series` and it is of the datatype – `Object` and it is treated as a character array. `Age` is another series and it is of the type – `Integer`. Third is the `Marks` is the third series and it is of the type `Integer` again.  The individual Series are one dimensional and hold only one data type. However, the `dataframe` as a whole contains more than 2 dimensions and is `heterogeneous` in nature.**

## Creating Series & data frames in python

#### Creating a simple Serie

In [6]:

#importing pandas library
import pandas as pd
 
#Creating a list
name = ['Ehtisham', 'Ali', 'Ayesha', 'Dua']

#Creating a Series by passing list variable to Series() function of pandas 
name_series = pd.Series(name)

#Printing Series
print(name_series)

0    Ehtisham
1         Ali
2      Ayesha
3         Dua
dtype: object


In [9]:
# Let’s check type of Series
print("Type of name_Series is : ",type(name_series))

Type of name_Series is :  <class 'pandas.core.series.Series'>


#### Creating multiple series

In [12]:
name = ['Ehtisham', 'Ali', 'Ayesha', 'Dua']
marks = [91.5,93,80,65]
age = [21,18,16,6]

#Creating a Series by passing list variable to Series() function of pandas 
name_ser = pd.Series(name)
marks_ser = pd.Series(marks)
age_ser = pd.Series(age)

#Printing Series
print("Name Series : ", name_ser, sep="\n")
print("Marks Series : ", marks_ser, sep="\n")
print("Age Series : ", age_ser, sep="\n")

Name Series : 
0    Ehtisham
1         Ali
2      Ayesha
3         Dua
dtype: object
Marks Series : 
0    91.5
1    93.0
2    80.0
3    65.0
dtype: float64
Age Series : 
0    21
1    18
2    16
3     6
dtype: int64


#### Creating Dataframe from multiple Series 

In [13]:
#Creating a Series by passing list variable to Series() function of pandas 
name_ser = pd.Series(name)
marks_ser = pd.Series(marks)
age_ser = pd.Series(age)

# Creating a Dictionary by passing series as values of dictionary
dic = {'Name':name_ser,
      'Marks':marks_ser,
      'Age':age_ser
      }

# Create dataframe by passing dictionary to pd.DataFrame function of pandas
df = pd.DataFrame(dic)
print("Printing of DataFrame .... ")
df

Printing of DataFrame .... 


Unnamed: 0,Name,Marks,Age
0,Ehtisham,91.5,21
1,Ali,93.0,18
2,Ayesha,80.0,16
3,Dua,65.0,6


#### How to add new column to the dataframe

In [14]:
address = pd.Series(['Lahore','Okara','Okara','Okara'])
##Creating new column in the dataframe by providing s Series created using list
df['Address'] = address
print("Printing of DataFrame .... ")
df

Printing of DataFrame .... 


Unnamed: 0,Name,Marks,Age,Address
0,Ehtisham,91.5,21,Lahore
1,Ali,93.0,18,Okara
2,Ayesha,80.0,16,Okara
3,Dua,65.0,6,Okara


## All statistical functions
- `count()` : Returns the number of times an element/data has occurred (non-null)
- `sum()`	: Returns sum of all values
- `mean()` : Returns the average of all values
- `median()` : Returns the median of all values
- `mode()` : Returns the mode
- `std()`	: Returns the standard deviation
- `min()`	: Returns the minimum of all values
- `max()`	: Returns the maximum of all values
- `abs()`	: Returns the absolute value

In [17]:
print("Total number of elements in each column of dataframe ")
df.count()

Total number of elements in each column of dataframe 


Name       4
Marks      4
Age        4
Address    4
dtype: int64

## Input and Output

- Often, you won’t be creating data but will be having it in some form, and you would want to import it to run your analysis on it. Fortunately, Pandas allows you to do this. Not only does it help in importing data, but you can also save your data in your desired format using Pandas.
- The below table shows the formats supported by Pandas, the function to read files using Pandas, and the function to write files.
|Input |type      |	Reader	Writer |
|------|----------|----------------|
|CSV   |read_csv  |  to_csv        |
|JSON  |read_json | to_json
|HTML  |read_html |to_html
|Excel |read_excel|to_excel
|SAS   |read_sas  |–
|Python|Pickle    |	read_pickle	to_pickle
|SQL   |read_sql  |to_sql
|Google|Big Query | read_gbq	to_gbq

In [None]:
name = ['Alaa Abdelnaby','Zaid Abdul-Aziz','Kareem Abdul-Jabbar','Mahmoud Abdul-Rauf','Tariq Abdul-Wahad']
