Skip to content
Advanced Python for Data Science Workshop
Jupyter Notebook Python Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
notebooks Finish exact schedule. Jan 13, 2020
scripts Add helper scripts. Jan 13, 2020
.gitignore Add gitignore Dec 23, 2019
LICENSE Add metadata files. Dec 23, 2019
README.md Junk Jan 13, 2020
environment.yml Add metadata files. Dec 23, 2019
postBuild Add placeholders. Dec 23, 2019

README.md

Advanced Python for Data Science

Course Description

This is a two-day course that introduces how one can use Python for advanced data science tasks, such as deep learning and natural language processing. Most of the time will be spent working through example problems end-to-end in the classroom. Students will learn the fundamentals of the Keras package (for deep learning) and will explore several NLP packages and methodologies to see the strengths of each. Some additional time will be reserved for discussion of real programming challenges students have encountered, and for an overview of related relevant technologies students may need in an industry setting (e.g. Git and GitHub).

Objectives

  1. Develop an intuition for what problems are suited to deep learning- and/or NLP-based solutions.
  2. Build familiarity with the basic interfaces of key Python libraries for deep learning and NLP: Keras, FuzzyWuzzy, and gensim.
  3. Gain a high-level understanding of the function of data science-adjacent technologies that students will encounter in the workplace, focusing on Git and GitHub.

Prerequisites

  • Strong understanding of core Python concepts: variables, loops, conditionals, and functions
  • Some experience using Jupyter Notebooks or Jupyter Lab
  • Solid grasp of Pandas and how to use it for data manipulation: filtering, selecting, aggregating, slicing (indexing), and updating
  • High-level understanding of modeling concepts: training and test data, model accuracy, and overfitting

Tentative Agenda

Exact timing TBD

Day 1

Topic Time
Breakfast / Social Time 8:00-9:00
Introductions 9:00-9:15
Refresher on Key Python & Pandas Concepts 9:15-9:45
Why Do We Need Deep Learning? 9:45-10:00
Why Use Python for Deep Learning? 10:00-10:30
Break 10:30-10:45
Overview of Keras and Tensorflow 10:45-12:00
Lunch 12:00-1:00
Walkthrough of Example Using Keras 1:00-1:45
High-level Discussion of Deep Learning -- How Does It Work? 1:45-2:30
Break 2:30-2:45
Deep Learning Case Study 3:00-4:00
Deep Learning Case Study Review; Q&A 4:00-4:30

Day 2

Topic Time
Breakfast / Social Time 8:00-9:00
What is NLP and What Problems Can It Solve? 9:00-9:30
Popular NLP Packages in Python 9:30-9:45
Break 9:45-10:00
Overview of the FuzzyWuzzy Package 10:00-10:30
Walkthrough of Example Using FuzzyWuzzy 10:30-11:00
Overview of Word2Vec and gensim Package 11:00-12:00
Lunch 12:00-1:00
Walkthrough of Example Using gensim 1:00-1:45
Discussion of Git, GitHub, and Other Data Science-adjacent Tools 1:45-2:30
NLP Case Study 2:30-3:30
Case Study Review; Q&A; Wrap-up 3:30-4:30

Course Preparation

You will need to install Python, Jupyter, and the relevant libraries on your personal computer for this workshop. I also recommend downloading the course materials.

See below for instructions on doing so.

1. Python, Jupyter, and Package Installation.

These easiest way to install Python, Jupyter, and the necessary packages is through Anaconda. To download and install Anaconda:

  1. Visit the Anaconda download page
  2. Select your appropriate operating system
  3. Click the "Download" button for Python 3.7 - this will begin to download the Anaconda installer
  4. Open the installer when the download completes, and then follow the prompts. If you are prompted about installing PyCharm, elect not to do so.
  5. Once installed, open the Anaconda Navigator and launch a Jupyter Notebook to ensure it works.
  6. Follow the package installation instructions to ensure pandas, keras, seaborn, scikit-learn, fuzzywuzzy, and gensim packages are installed.
    • Note that fuzzywuzzy may need to be installed from a non-standard "channel", or package source -- its channel is called conda-forge. If you have trouble installing fuzzywuzzy, I'll be able to help in class.

2. Download Class Materials

There are two ways to download the class materials:

  1. Clone it - If you're familiar with using Git, I recommend cloning the repo.
  2. Download the files as a zip - This will allow you to download a static copy of the files here, but in order to get any updates you'll need to redownload the entire repo. Use this link.

About Your Instructor

I'm a Lead Data Scientist at 84.51˚ and an adjunct instructor at the University of Cincinnati. I've been teaching classes professionally on Python, Linux, and Spark for over three years.

If you have any issues or concerns, feel free to contact me via email at ethanpswan@gmail.com.

You can’t perform that action at this time.