Data Science Fundamentals: Python |
[Table of Contents](../index.ipynb)
- - - 
<!--NAVIGATION-->
Module 0. **[Introduction To Python](./01_mod_intro_python.ipynb)** | [How To Run Python Code](./02_how_to_run_python_code.ipynb)

# INTRODUCTION TO PYTHON
---
Introduction to Python is a resource for students who want to learn Python as their first language.

## PYTHON VERNACULAR 
---
**PyPi** or the Python Package Index (PyPI) is a repository of software for the Python programming language. PyPI helps you find and install software developed and shared by the Python community. Learn about installing packages. Package authors use PyPI to distribute their software. [See all packages here](https://pypi.org/).

An **API** is not a collection of code per se - it is more like a "protocol" specification how various parts (usually libraries) communicate with each other. There are a few notable "standard" APIs in python. E.g. the [DB API](https://www.python.org/dev/peps/pep-0249/).

A **LIBRARY** is anything that is not an application - in python, a library is a module - usually with submodules. The scope of a library is quite variable - for example the python standard library is vast (with quite a few submodules) while there are lots of single purpose libraries in the PyPi.

A **PACKAGE** is a [collection of python modules under a common namespace](https://docs.python.org/2/tutorial/modules.html#packages). In practice one is created by placing multiple python modules in a directory with a special __init__.py module (file).

A **MODULE** is a [single file of python code that is meant to be imported](https://docs.python.org/2/tutorial/modules.html#modules). This is a bit of a simplification since in practice quite a few modules [detect when they are run as script and do something special in that case](http://ibiblio.org/g2swap/byteofpython/read/module-name.html). It's used if you want to use a handy function that you’ve written in several programs without copying its definition into each program.

A **SCRIPT** is a single file of python code that is meant to be executed as the 'main' program.

If you have a set of code that spans multiple files, you probably have an **APPLICATION** instead of script.

**PIP** is a de facto standard package-management system used to install and manage software packages written in Python. The name was created by Bicking himself, the name is an acronym for **"Pip Installs Packages"**.

## WHAT ARE THE LATEST AND GREATEST VERSIONS OF PYTHON?
--

[Python 2.7.18 is the absolute last official release for Python 2](https://stackoverflow.blog/2020/04/23/the-final-python-2-release-marks-the-end-of-an-era/).  No more Python 2 versions will be created.

The latest and greatest Python 3 is [Python 3.8.2](https://www.python.org/downloads/release/python-382/).  

## WHY ARE THERE TWO VERSIONS OF PYTHON?
--

There are three main reasons for having Python 2.7.18 and 3.8.2.

- **Reason 1.** 

Provide proper support for international characters. Python 1.x only supported 8 bit strings, which meant that a program could work with e.g. English text, or Russian text, Vietnamese text, but you could for instance not have both Vietnamese and Russian letters in the same string, and Python didn’t track encodings for you. 

Python 2.0 introduced Unicode, which meant that you could work with international text, but it was a bit of an add-on. 

With Python 3.0, the normal strings are fully international Unicode strings, and 8-bit strings have their own types.
Get rid of stupid mistakes, such as integer division being floor division, and print and exec being statements rather than functions.

- **Reason 2.**

Other changes occurred, such as standard library cleanups and changes in the C extension API.

Since this meant that the effort of porting from 2.x to 3.y was much higher than previous 2.x to 2.y upgrades, it was decided to maintain 2.7 for a long time.

For many years, developers have kind of been waiting for each other. There was no point supporting 3.x with your libs if few people used it, and there was no point trying to start using 3.x if the libs you wanted weren’t ported.

- **Reason 3.** 

By now, everything you might need to likely to be ported, and it’s really just laziness that stops most Python 2.7.18 developers from using 3.8.2.

**[See this cheat sheet on Python compatiblity](http://python-future.org/compatible_idioms.pdf)**

INSTALLING PYTHON DISTRIBUTION PACKAGES
---

The following command will install the latest version of a module and its dependencies from the [Python Packaging Index](https://pypi.python.org/pypi) and see the [100/1000/4000 most downloaded packages over the last 365 days](https://hugovk.github.io/top-pypi-packages/)

It’s also possible to specify an exact or minimum version directly on the command line. When using comparator operators such as >, < or some other special character which get interpreted by shell, the package name and the version should be enclosed within double quotes:

Normally, if a suitable module is already installed, attempting to install it again will have no effect. Upgrading existing modules must be requested explicitly:

Top 20 Python Libraries
---

1. <b>Requests.</b> The most famous http library written by kenneth reitz. It’s a must have for every python developer.

2. <b>Scrapy.</b> If you are involved in webscraping then this is a must have library for you. After using this library you won’t use any other.

3. <b>Plotly.</b> Python plotting library for collaborative, interactive, publication-quality graphs.

4. <b>Pillow.</b> A friendly fork of PIL (Python Imaging Library). It is more user friendly than PIL and is a must have for anyone who works with images.

5. <b>SQLAlchemy.</b> A database library. Many love it and many hate it. The choice is yours.

6. <b>BeautifulSoup.</b> I know it’s slow but this xml and html parsing library is very useful for beginners.

7. <b>Twisted.</b> The most important tool for any network application developer. It has a very beautiful api and is used by a lot of famous python developers.

8. **[NumPy](https://numpy.org/). Module 9** NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

9. <b>SciPy.</b> When we talk about NumPy then we have to talk about scipy. It is a library of algorithms and mathematical tools for python and has caused many scientists to switch from ruby to python.

10. **[matplotlib](https://matplotlib.org/). Module 11** A numerical plotting library. It is very useful for any data scientist or any data analyzer.

11. <b>Boto.</b> Boto is a Python package that provides interfaces to Amazon Web Services. 

12. <b>simplejson.</b> simplejson is a simple, fast, complete, correct and extensible JSON <http://json.org> encoder and decoder for Python 2.5+ and Python 3.3+.

13. <b>pyQT.</b> A GUI toolkit for python. It is my second choice after wxpython for developing GUI’s for my python scripts.

14. <b>pyGtk.</b> Another python GUI library. It is the same library in which the famous Bittorrent client is created.

15. <b>Scapy.</b> A packet sniffer and analyzer for python made in python.

16. <b>seaborn.</b> A python library which provides statistical data visualization.

17. <b>nltk.</b> Natural Language Toolkit – I realize most people won’t be using this one, but it’s generic enough. It is a very useful library if you want to manipulate strings. But it’s capacity is beyond that. Do check it out.

18. <b>nose.</b> A testing framework for python. It is used by millions of python developers. It is a must have if you do test driven development.

19. <b>SymPy.</b> SymPy can do algebraic evaluation, differentiation, expansion, complex numbers, etc. It is contained in a pure Python distribution.

20. **[scikit-learn](https://scikit-learn.org/stable/). Module 12** Scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. 

## Honorable Mentions
---

- **[Pandas](https://pandas.pydata.org/). Module 10**. is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

![caption](../files/pip.png)

If Packages already installed, feel free to upgrade by submitting the following command

Also don't forget to upgrade PIP by issuing the following command

To list out all packages installed locally issue the following command

In [2]:
pip list

Package                            Version            
---------------------------------- -------------------
alabaster                          0.7.12             
anaconda-client                    1.7.2              
anaconda-navigator                 1.9.12             
anaconda-project                   0.8.3              
appdirs                            1.4.4              
applaunchservices                  0.2.1              
appnope                            0.1.0              
appscript                          1.0.1              
argh                               0.26.2             
asn1crypto                         1.3.0              
astroid                            2.3.3              
astropy                            4.0                
atomicwrites                       1.3.0              
attrs                              19.3.0             
autopep8                           1.4.4              
Babel                              2.8.0        

Note: you may need to restart the kernel to use updated packages.


To list all package versions installed locally issue the following command 

In [3]:
pip freeze

alabaster==0.7.12
anaconda-client==1.7.2
anaconda-navigator==1.9.12
anaconda-project==0.8.3
appdirs==1.4.4
applaunchservices==0.2.1
appnope==0.1.0
appscript==1.0.1
argh==0.26.2
asn1crypto==1.3.0
astroid==2.3.3
astropy==4.0
atomicwrites==1.3.0
attrs==19.3.0
autopep8==1.4.4
Babel==2.8.0
backcall==0.1.0
backports.functools-lru-cache==1.6.1
backports.shutil-get-terminal-size==1.0.0
backports.tempfile==1.0
backports.weakref==1.0.post1
beautifulsoup4==4.8.2
bitarray==1.2.1
bkcharts==0.2
bleach==3.1.0
bokeh==1.4.0
boto==2.49.0
Bottleneck==1.3.2
certifi==2019.11.28
cffi==1.14.0
cfgv==3.1.0
chardet==3.0.4
Click==7.0
cloudpickle==1.3.0
clyent==1.2.2
colorama==0.4.3
conda==4.8.3
conda-build==3.18.11
conda-package-handling==1.6.0
conda-verify==3.4.2
contextlib2==0.6.0.post1
cryptography==2.8
cycler==0.10.0
Cython==0.29.15
cytoolz==0.10.1
dask==2.11.0
decorator==4.4.1
defusedxml==0.6.0
diff-match-patch==20181111
distlib==0.3.1
distributed==2.11.0
docutils==0.16
entrypoints==0.3
et-xmlfile==1.0.1
fa

![caption](../files/top10.png)

- - - 
<!--NAVIGATION-->
Module 0. **[Introduction To Python](./01_mod_intro_python.ipynb)** | [How To Run Python Code](./02_how_to_run_python_code.ipynb)
<br>
[Top](#)

- - -

Copyright © 2020 Qualex Consulting Services Incorporated.