# Essential Libraries in Python

## NumPy (Numerical Python)
- Provides data structures, algorithms, and library glue needed for more scientified application.
- Based around a multidimensional array object, `ndarray`.
- Used as a container for data for be passed between algorithms and libraries.

## pandas (panel data)
- High-level data structures and functions to make it easier to work with **tabular** data.
- **Primary Objects**:
  - `DataFrame`: Tabular, column-oriented data structures with row and column labels.
  - `Series`: One-dimensional labelled array object.
- Blend array-computing ideas of NumPy with data-manipulation capabilities of spreadsheets and relational databases.

## Matplotlib 
- Python library for producing plots and other two-dimensional data visualizations.

## IPython and Jupyter
IPython - Designed to maximize productivity in both interactive computing and software development.
- Encourages *execute-explore* workflow, rather than *edit-compile-run* workflow.
- Data analysis is typically exploration, trial and error, and iteration.

## SciPy
SciPy is a collection of packages address different standard problem domains in scientific computing. They include:
- `scipy.integrate`: Numerical integration routines and different equation solvers.
- `scipy.linalg`: Linear algebra routines, maxtrix decompositions
- `scipy.optimize`: Function optimizers and root finding algorithms
- `scipy.signal`: Signal processing tools
- `scipy.sparse`: Sparse matrices and spare linear system solvers
- `scipy.special`: Wrapper around SPECFUN (Fortran library with common math functions)
- `scipy.stats`: Stats

## scikit-learn
- ML toolkit for Python Programers. Includes submodules for:
  - Classification: SVM, nearest neighbors, random forest, logistic regression, etc.
  - Regression: Lasso, ridge regression, etc.
  - Clustering: k-means, spectral clustering, etc.
  - Dimensionality reduction
  - Model selection: Grid search, cross-validation, metrics
  - Preprocessing: Feature extraction, normalization

## statsmodel
statsmodel is a statistical analysis packaged. Contains packages for classical statistics and economometrics:
- Regression Models
- Analysis of Variance (ANOVA)
- Time Series Analysis: AR, ARMA, ARIMA, VAR, etc.
- Nonparametric Methods
- Visualization of statistical model results

# General Tasks in Data Analysis
- <u>Interacting with outside world</u>: Reading and writing with different file stormats and data stores.
- <u>Preparation</u>: Cleaning, combining, normalizing, reshaping, slicing and dicing, transforming data for analysis.
- <u>Transformation</u>: Applying mathematical and statistical operations to groups of datasets to derive new datasets.
- <u>Modeling and Computation</u>: Connecting your data to statistical models, machine learning algorithms, or other computational tools.
- <u>Presentation</u>: Creating interactive or static graphical visualization or textual summaries.