*********************************************************************************************************
# A Tour of Python 3  
version 1.0.1  
Authors: Phil Pfeiffer, Zack Bunch, and Feyisayo Oyeniyi  
East Tennessee State University  
Last updated June 2021  
*********************************************************************************************************

# Appendix C - Python Libraries  
 C.1. [Standard Library Modules](#Python-Libraries-Standard-Library-Modules)  
 C.2. [Other Open Source Libraries](#Python-Libraries-Other-Libraries)  
 &ensp; C.2.1 [Data Analysis](#Python-Libraries-Other-Libraries-Data-Analysis)  
 &ensp;&ensp; C.2.1.1 [NumPy](#Python-Libraries-Data-Analysis-Libraries-NumPy)  
 &ensp;&ensp; C.2.1.2 [Pandas](#Python-Libraries-Data-Analysis-Libraries-Pandas)  
 &ensp;&ensp; C.2.1.3 [Matplotlib](#Python-Libraries-Data-Analysis-Libraries-Matplotlib)  
 &ensp;&ensp; C.2.1.4 [Seaborn](#Python-Libraries-Data-Analysis-Libraries-Seaborn)  
 &ensp;&ensp; C.2.1.5 [SciPy](#Python-Libraries-Data-Analysis-Libraries-SciPy)  
 &ensp; C.2.2 [Artificial Intelligence](#Python-Libraries-Other-Libraries-AI)  
 &ensp;&ensp; C.2.2.1 [Scikit-Learn](#Python-Libraries-AI-Scikit-Learn)  
 &ensp;&ensp; C.2.2.2 [TensorFlow](#Python-Libraries-AI-TensorFlow)  
 &ensp;&ensp; C.2.2.3 [Keras](#Python-Libraries-AI-Keras)  
 &ensp;&ensp; C.2.2.4 [Bokeh](#Python-Libraries-AI-Bokeh)  
 &ensp;&ensp; C.2.2.5 [Scrapy](#Python-Libraries-AI-Scrapy)  
 &ensp;&ensp; C.2.2.6 [BeautifulSoup](#Python-Libraries-AI-BeautifulSoup)  
 &ensp;&ensp; C.2.2.7 [NLTK](#Python-Libraries-AI-NLTK)  
 &ensp;&ensp; C.2.2.8 [XGBoost](#Python-Libraries-AI-XGBoost)  


## C.1. Standard Library Modules <a name='Python-Libraries-Standard-Library-Modules'></a>

To use a standard library in Python, import the module into the code using the Python’s import keyword, followed by the module's name. Examples:

&ensp; `import sys`  
&ensp; `print(sys.argv)`  
&ensp; `import random`  
&ensp; `random.randrange (8)` 

The [Python 3.9 standard library documentation](http://docs.python.org/3/library/) devotes 30+ sections to these supporting libraries, including libraries for addressing the following needs:
-  text processing, including regular expressions
-  binary data processing
-  auxiliary data types, including temporal types, arrays, and queues
-  numeric types, including decimal values, fractions, and random values
-  functional programming
-  file and directory access
-  data persistence
-  data compression and archiving
-  file formats
-  cryptography
-  operating system services
-  concurrent execution, including threading
-  context variables
-  interprocess communication and networking
-  Internet data handling, including e-mail
-  markup processing, including html and xml
-  Internet protocols
-  multimedia
-  internationalization
-  program frameworks, including Turtle graphics
-  GUI interfaces
-  development tools, including doctest
-  debugging and profiling
-  Python runtime services, including sys and futures
-  custom Python interpreters
-  Python module importation
-  language manipulation
-  generic output formatting
-  Microsoft-, Unix-, and other-platform-specific modules

An additional section describes superseded and undocumented modules.

## C.2. Other Open Source Libraries <a name='Python-Libraries-Other-Libraries'></a>

Other mainstream Python libraries can be obtained with `pip`. `pip`, a package installer for Python, installs and manages libraries and dependencies that are not part of the standard library. `pip` comes pre-installed in the Python versions 3.4 or older. To confirm if `pip` is installed on your machine, run the following command in the console:

&ensp;&ensp;&ensp; `pip --version`

If `pip` is installed, it will provide an output showing the pip's version number. Otherwise, to install it, follow the steps outlined [here.](https://pip.pypa.io/en/stable/installing/#installing-with-get-pip-py)

To install packages with `pip`, use the command `pip install`. This command uses the following syntax:

&ensp;&ensp;&ensp; `pip install [package name]`

For example, to install the NumPy library using `pip`, do the following:

&ensp;&ensp;&ensp; `pip install NumPy`


### C.2.1  Data Analysis <a name='Python-Libraries-Other-Libraries-Data-Analysis'></a>

#### C.2.1.1  NumPy <a name='Python-Libraries-Data-Analysis-Libraries-NumPy'></a>

[NumPy](https://numpy.org/) supports n-dimensional arrays and matrices, including basic linear algebra functions on these structures. NumPy also supports, Fourier transforms and advanced random number generation and integrates with classic languages like Fortran, C and C++.

*Pros:* NumPy's extensive collection of high-level mathematical functions supports a wide range of operations on large multidimensional arrays and matrices. NumPy arrays and matrices take up less space than Python lists. NumPy operations on these structures are faster than Python operations on list-based representations of matrices and arrays.

*Cons:* NumPy structures must be stored in contiguous areas of memory, which makes insertion and deletion operations costly.

#### C.2.1.2  Pandas <a name= 'Python-Libraries-Data-Analysis-Libraries-Pandas'></a>

[Pandas](https://pandas.pydata.org/) stands for *Python Data Analysis Library*. Pandas uses series and data frames to provide very fast and efficient mechanisms for managing, exploring and manipulating data. It uses alignment and indexing to organize and label data correctly; and supports various file formats, including JSON, CSV, Excel and HDF5.

*Pros:* Pandas handles large datasets efficiently, while supporting flexible and customizable access to data. Its simple means of representing data makes analyzing datasets easy. It has an extensive set of features, including support for handling missing data, filtering unique values, and merging and joining datasets.

*Cons:* Pandas offers poor support for 3D matrices.

#### C.2.1.3 Matplotlib <a name='Python-Libraries-Data-Analysis-Libraries-Matplotlib'></a>

[Matplotlib](https://matplotlib.org/) is a plotting library for 2D graphics. It can be used in Python scripts, in shell scripts, in web application servers, and with other graphical user interface toolkits. It supports the creation of different kinds of plots, using an object hierarchy to structure plots. Plots are embedded in figure objects: box-like containers that can contain multiple axes objects, each of which corresponds to a plot.

*Pros:* The level of granularity control that Matplotlib provides over plots is unrivaled.

*Cons:* It's difficult to make plots interactive.

#### C.2.1.4 Seaborn <a name='Python-Libraries-Data-Analysis-Libraries-Seaborn'></a>

[Seaborn](https://seaborn.pydata.org/) is essentially a high-level façade for the matplotlib library. It enables users to create amplified data visuals: presentations that help to explicate data by displaying visual contexts that can unearth non-obvious correlation between variables.

*Pros:* Seaborn works well with data frames. It comes with high-level interfaces and customized themes compared matplotlib.

#### C.2.1.5 SciPy <a name='Python-Libraries-Data-Analysis-Libraries-SciPy'></a>

[SciPy](https://www.scipy.org/) is an advanced library, comparable to NumPy. It offers a full set of operations for matrix manipulation. It contains modules for optimization, linear algebra, integration and statistics.

*Pros:* SciPy is suitable for implementing complex computations on numerical data.

### C.2.2  Artificial Intelligence - Data Science  <a name='Python-Libraries-Other-Libraries-AI'></a>

#### C.2.2.1  Scikit-Learn <a name='Python-Libraries-AI-Scikit-Learn'></a>

[Scikit-Learn](https://scikit-learn.org/) is focused on machine learning models. Its feature extraction operations are largely intended for Natural Language Processing (NLP). It has a cross-validation feature that supports the use of multiple metrics to validate a model accuracy.

*Pros:* Scikit learn robust library is suitable for any end-to-end ML project, from research through production deployments. It is very good for working with complex data.

*Con:* Scikit learn is not the right package for implementing deep learning.

#### C.2.2.2 TensorFlow <a name= 'Python-Libraries-AI-TensorFlow'></a>

[TensorFlow](https://www.tensorflow.org/) is a great choice for working with machine intelligence at a production level scale. It provides efficient support for manipulating mathematical expressions involving multi-dimensional arrays; provides good support for deep neural networks and machine learning concepts; and is highly scalable, relative to increases in machines and data set size.

*Pros:* TensorFlow works effectively with single or multiple GPUs. It provides a scalable and stable interactive multiplatform programming interface. It can be used for speech and image recognition, text-based applications, time-series analysis and video detection.

*Cons:* The computation speed is slow. TensorFlow's unique structure makes it difficult to find and debug errors.

#### C.2.2.3 Keras <a name= 'Python-Libraries-AI-Keras'></a>

[Keras](https://keras.io/) provides an easier mechanism to express neural networks. It also provides some of the best utilities for compiling models, processing datasets, visuaizing graphs, and more.

*Pros:* Keras is modular in nature, therefore making it expressive, flexible and apt for innovative research. It runs smoothly on both CPU and GPU.

*Cons:* Compared to other libraries, Keras is relatively slow.

#### C.2.2.4 Bokeh <a name='Python-Libraries-AI-Bokeh'></a>

[Bokeh](https://bokeh.org/) allows for dynamic visualization. It renders graphics using JavaScript and HTML, making it browser-compatible. In bokeh, graphs are built up one layer at a time.

*Pros:* Bokeh can be used for web application with a high level of interactivity. It's also known for enabling high-performance visual presentation of large data sets in modern web browsers.

*Cons:* The making of 'unconventional' charts, such as simulating forces between atoms and dragging atoms through space, is currently beyond the scope of bokeh's library.

#### C.2.2.5 Scrapy <a name= 'Python-Libraries-AI-Scrapy'></a>

[Scrapy](https://scrapy.org/) is a web crawling framework. It uses a crawler that automatically extracts data from web pages. It's specifically created for downloading, cleaning and saving data from the web. It carries out the end-to-end processing. It generates feed exports in formats such as JSON, CSV and XML.

*Pros:* Scrapy requests are scheduled and processed asynchronously. Its flexibility and versatility make it suitable for large projects.

*Cons:* Its learning curve is steep.

#### C.2.2.6 BeautifulSoup <a name='Python-Libraries-AI-BeautifulSoup'></a>

[BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) is a web scraping tool used to extract data from HTML. It is solely a parsing library.

*Pros:* It's very easy to learn and effective for small projects.

*Cons:* It's not ideal for large projects.

#### C.2.2.7 NLTK <a name='Python-Libraries-AI-NLTK'></a>

[NLTK](https://www.nltk.org/), the Natural Language Tool Kit, is used mainly for solving natural language processing tasks. NLTK is a string processing library that takes string as input and return strings or lists of strings as output. It can be used for text tagging, stemming, classifications, regression, tokenization, and corpus tree creations. It's the most well-known and full NLP library with many 3rd extensions.

*Pros:* It automatically summarizes text, conference resolution and discourse analysis and identifies the discourse structure of connected text (discourse relationships between sentences).

*Cons:* It's difficult to learn and use, slow, and can't be used for neural network models.

#### C.2.2.8   XGBoost <a name='Python-Libraries-AI-XGBoost'></a>

[XGBoost](https://xgboost.readthedocs.io/) stands for eXtreme Gradient Boosting. XGBoost is an implementation of Gradient Boosting Machines (GBM). It's used for supervised learning. It's good for speed, parallelization, implementation on single, distributed systems and out-of-core computation.

*Pros:* XGBoost is very fast compared to other implementations of gradient boosting, as well as portable, flexible, and efficient. It offers parallel tree boosting, a technique for resolving various problems.