# Advanced Python for Data Science

Ethan Swan

Jan 24 & 25, 2020

## Introductions

<h2>About Me</h2>
<img src="images/ethan-headshot.gif" alt="ethan-headshot.gif" width="300" height="300" style="display:inline;float:left;padding:40px;">
<br><br>
<h3 style="display:inline;">Professional</h3>
<br><br>
Lead Data Scientist and Technology Specialist -- 84.51&deg;
<br><br>
<hr>
<h3 style="display:inline;">Academic</h3>
<br><br>
BS in Computer Science, University of Notre Dame <br><br>
MBA, Concentration in Business Analytics, University of Notre Dame

#### Contact

* Website: [ethanswan.com](http://www.ethanswan.com/)
* GitHub: [eswan18](https://github.com/eswan18/)
* Twitter: [@eswan18](https://twitter.com/eswan18)
* LinkedIn: [Ethan Swan](https://linkedin.com/in/ethanpswan)
* Email: [ethanpswan@gmail.com](mailto:ethanpswan@gmail.com)

## Your Turn

We'll go around the room. Please share:
- Your name
- Your job or field
- How you use Python now or would like to in the future

## Course

## Course Objectives

The following are the primary learning objectives of this course:

- Develop an intuition for what problems are suited to deep learning- and/or NLP-based solutions.

- Build familiarity with the basic interfaces of key Python libraries for deep learning and NLP: Keras, FuzzyWuzzy, and gensim.

- Gain a high-level understanding of the function of data science-adjacent technologies that students will encounter in the workplace, focusing on Git and GitHub.

## Course Agenda

### Day 1

| Topic                                                                          | Time        |
| :----------------------------------------------------------------------------- | ----------: |
| Breakfast / Social Time                                                        | 8:00-9:00   |
| Introductions                                                                  | 9:00-9:15   |
| Refresher on Key Python & Pandas Concepts                                      | 9:15-9:45   |
| Why Do We Need Deep Learning?                                                  | 9:45-10:00  |
| Why Use Python for Deep Learning?                                              | 10:00-10:30 |
| Break                                                                          | 10:30-10:45 |
| Overview of Keras and Tensorflow                                               | 10:45-12:00 |
| Lunch                                                                          | 12:00-1:00  |
| Walkthrough of Example Using Keras                                             | 1:00-1:45   |
| High-level Discussion of Deep Learning -- How Does It Work?                    | 1:45-2:30   |
| Break                                                                          | 2:30-2:45   |
| Deep Learning Case Study                                                       | 3:00-4:00   |
| Deep Learning Case Study Review; Q&A                                           | 4:00-4:30   |

### Day 2

| Topic                                                                    | Time        |
| :----------------------------------------------------------------------- | ----------: |
| Breakfast / Social Time                                                  | 8:00-9:00   |
| What is NLP and What Problems Can It Solve?                              | 9:00-9:30   |
| Popular NLP Packages in Python                                           | 9:30-9:45   |         
| Break                                                                    | 9:45-10:00  |
| Overview of the FuzzyWuzzy Package                                       | 10:00-10:30 |
| Walkthrough of Example Using FuzzyWuzzy                                  | 10:30-11:00 |
| Overview of Word2Vec and gensim Package                                  | 11:00-12:00 |
| Lunch                                                                    | 12:00-1:00  |
| Walkthrough of Example Using gensim                                      | 1:00-1:45   |
| Discussion of Git, GitHub, and Other Data Science-adjacent Tools      | 1:45-2:30   |
| NLP Case Study                                                           | 2:30-3:30   |
| Case Study Review; Q&A; Wrap-up                                          | 3:30-4:30   |

## Course Philosophy

Beginners typically need the instructor to make connections and solve problems for them.

*Why is this code not running?
What types of real world problems could I use this package for?*

But as intermediate to advanced users, I believe you'll be more capable of seeing those connections yourselves.
Instead of diving into details and working through small code examples, this advanced workshop takes a slightly different approach...

- **Give you an overview of the tools you might need to solve a problem**. I can't teach you deep learning or NLP in just two days, but I *can* give you a foundation. And as experienced coders, you'll be able to fill in the details yourselves when the time comes to use these tools.

- **Explain more of the intuition behind tools and techniques**. Beginners can't yet see the forest for the trees -- they are caught up in small problems and not yet ready to understand the big picture. But in this class we will talk more about general design patterns of Python and its libraries, in a way that should help you *learn them* instead of simply memorize functions.

- **Expect you to help yourself**. I'll still be here to answer questions and help with hard problems, but the mark of an experienced programmer is that he/she consults references often (Google, documentation, etc) and can find answers there. You'll need to do that during this course and afterward when you apply the techniques we discuss.

## Prerequisites

### Knowledge

#### Python

- If you're attending this class, it's assumed you're comfortable with the material covered in the [Introduction to Python for Data Science](https://github.com/uc-python/intermediate-python-datasci) and [Intermediate Python for Data Science](https://github.com/uc-python/intermediate-python-datasci) classes.
- At a very high level, those courses covered:
    - Importing data into and exporting data out of Python, via Pandas
    - Wrangling data in Python with Pandas
    - Basics of visualization with Seaborn
    - Control flow
    - Writing functions
    - Conda environments
    - Running Python outside of Jupyter notebooks
    - Basics of modeling with scikit-learn

#### Technology

* If you're attending this class, it's assumed you're comfortable with launching and using Python via Jupyter Notebooks -- and ideally outside of Jupyter as well.
* Course materials (slides, case studies, etc.) will be in Jupyter Notebooks, but you're free to use your IDE of choice when completing exercises and case studies.

### Technology Installation

- Unlike my other courses, Advanced Python is not designed with Binder in mind.
- This means that you'll need to use your personal laptop to run today's code.
- Why? We're going to be working with bigger data and more computationally-intensive algorithms, for which Binder is not well-equipped.
    - In an industry setting, using these techniques would best be done on a *server*, not a personal computer.

#### Anaconda

* Anaconda is the easiest way to install Python 3 and Jupyter.
* If you have not yet installed Anaconda, please follow the [directions in the course README](https://github.com/uc-python/intermediate-python-datasci).
* Be sure that all Python packages mentioned in the README are also installed: `pandas`, `scikit-learn`, `seaborn`, `keras`, `fuzzywuzzy`, and `gensim`.
* This Anaconda installation will not be able to natively display the course content as slides, but I recommend using it for completing exercises and the case studies.

#### JupyterLab
- If you took the introductory and/or intermediate courses, you may have used Jupyter Notebooks to write Python.
- Jupyter Notebooks are slowly being deprecated in favor of a new, more featureful product called JupyterLab.
- JupyterLab is extremely similar but supports more features, and Notebooks is no longer being updated.
- I recommend using JupyterLab today even if you haven't used it before -- it comes packaged with Anaconda and should feel very familiar!

## Course Materials

* All of the material for this course can be reached from the [GitHub repository](https://github.com/uc-python/advanced-python-datasci).
* This repository has access to the slides and notebooks.
* You should download the material -- available via [this link](https://github.com/uc-python/advanced-python-datasci/archive/master.zip) -- and open it via Anaconda Navigator and Jupyter Notebooks/Lab.

### Slides *and* Notebooks

- I'll be showing the material in slide format most of the time.
- These slides contain the same content as your notebooks, so you can follow along and run cells as we go.

## Questions

Are there any questions before moving on?