# Data Analytics

![Python and Pandas!](./images/PythonAndPandas.png)

## We are going to learn about ...

- What is Pandas
- Pandas & NumPy
- What Pandas can do
- Pandas DataFrames & Series

<br>

---


## What is Pandas

- An open-source Python package that is most widely used for data science/data analysis and machine learning tasks. 
- Built on top of NumPy which provides support for multi-dimensional arrays.
- References both “Panel Data” and “Python Data Analysis”
- The name Pandas is derived from the word "Panel Data"
- Created by Wes McKinney in 2008

## Pandas & NumPy

- NumPy is a library that adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
- Pandas is a high-level data manipulation tool that is built on the NumPy package
- Pandas offers an in-memory 2d table object called a DataFrame
- A DataFrame is structured like a table or spreadsheet -- with rows and columns
- There are a few functions that exist in NumPy that we use specifically on Pandas DataFrames
- Just as the "ndarray" is the foundation of NumPy, the "Series" is the core object of Pandas
- NumPy consumes less memory than Pandas, and is faster than Pandas
- These two libraries are the best libraries for data science applications
- Pandas mainly works with tabular data, whereas NumPy works with numerical data

## Pandas & Jupyter Notebooks

Jupyter Notebooks offer a good environment for using pandas to do data exploration and modeling, but pandas can also be used in text editors just as easily.

Jupyter Notebooks give us the ability to execute code in a particular cell as opposed to running the entire file. This saves a lot of time when working with large datasets and complex transformations. 

Notebooks also provide an easy way to visualize pandas’ DataFrames and plots.


## What can Pandas do?

Pandas can perform five significant steps required for processing and analysis of data, irrespective of the origin of the data, -- load, manipulate, prepare, model, and analyze.

What’s cool about Pandas is that it takes data (like a CSV or TSV file, or a SQL database) and creates a Python object with rows and columns called a 'data frame' that looks very similar to table in statistical software (think Excel).

In fact, with Pandas, you can do everything that makes world-leading data scientists vote Pandas as the best data analysis and manipulation Python tool available.

### Pandas can do ...

|    |    |
|----|----|
| Data Cleansing | Data fill |
| Data normalization | Merges and joins |
| Data visualization | Statistical analysis |
| Data inspection | Loading and saving data |

<br>

### DataFrames & Series

DataFrames and Series are quite similar in that many operations that you can do with one you can do with the other, such as filling in null values and calculating the mean

![Pandas Series and DataFrames](./images/Pandas_series-and-dataframe.png)

#### Pandas Series

- A Pandas Series is like a column in a table
- It is a one-dimensional array holding data of any type
- If nothing else is specified, the values of the series are labeled with their index numbers -- first value has index 0, second value has index 1 etc.
- These labels can be used to access specified values in the series
- With the index argument, you can name your own labels for the indexes of your series
- When you have created labels, you can access an item by referring to the label
- You can also use a key/value object, like a dictionary, when creating a Series
- You can create a DataFrame from two Series

#### Pandas DataFrames

- A Pandas DataFrame is a 2-D data structure, like a 2 dimensional array, or a table with rows and columns
- Pandas use the loc attribute to return one or more specified row(s)
- With the index argument, you can name your own indexes
- Use the named index in the loc attribute to return the specified row(s)
- If your data sets are stored in a file, Pandas can load them into a DataFrame



---

## Installing and Using Pandas

**Remember: Pandas is a Module.**

You have to install it first, and NumPy is required:

```python
    pip install pandas
```

Then you have to import it at the beginning of every code file to use it:

```python
    import numpy as np 
    import pandas as pd
```


Getting started ...

Working with Series...

Working with Dataframes ...