Lecture notes, readings, code samples and resources for Brad Flaugher's Data-Focused Programming Bootcamp
- (Required) Install Ubuntu Linux on your PC, Install Guide, and additional notes for dual-booting with Windows NOTE if you have an M1 Mac this will be almost impossible, so sell that thing or talk to Brad.
- (Required) Install Docker Install docker Ubuntu
- (Required) Read Command Line for Beginners
- (Required) Read Chapters 1-6 of Automate The Boring Stuff with Python
- (Required) if you have never used a Jupyter notebook before, please read Jupyter Notebooks Getting Started Tutorial
- (Required) learn how to use git if you never have before. Git and GitHub for beginners crash course
- (Recommended) Join the bootcampers LinkedIn Group
- Note 1: Lectures are a small part of the course, most bootcamper's time will be spent working on their final potrfolio projects.
- Note 2: The 6 week course is broken into numeric and alphabetical lectures. Lectures 1-6 are technical in nature, Lectures A-E are soft-skills and history.
- Definitions: Data Scientist, Data Engineer, Data Analyst (What do we spend time doing?)
- Definitions: Machine Learning and Artificial Intelligence
- History: What kind of ML is used today? How much of this book is practically useless?
- Neural Networks: Babies and Vision
- Neural Networks: Single Cell Neural Network aka Regression in Excel
- Neural Networks: Name and Height "Regression"
- Neural Networks: When will GPT-3 "insights" become stale? Is this learning? is this engineering? is this science?
- Neural Networks: Correllating words and images
- Neural Networks: Why only study NNs for now? NNs are Decision Trees, NNs vs SVMs
- Neural Networks: Playing Pong with real neurons
- Final Project Intro: Huggable Model and Google Play Virus Model
- VIDEO: Oleh's Car Price Predictor and Source Code
- VIDEO: Fall 2022 Bootcampers Presentation WARNING LARGE FILE and Hanna's Source Code
- Help Brad with FOSS Models for Medusa
- Free captioned images from the web, LAION
- The entire web, scraped for you, Common Crawl via comcrawl
- More specialized data... Datahub and Awesome pubilc datasets and Huggingface Datasets and (Huggingface)[https://huggingface.co/docs/datasets/tutorial]
- Definitions: Unix, Linux, Command Line, DevOps, Programming Language
- History: Python and C Speed Test, SQL
- History: BERT, GPT3, DALLE, Stable Diffusion and self-driving cars.
- History: A historical perspective on technological adoption, is it fast or slow? Flavors of technological disruption. (Lateral thinking with withered technology, how many people can use spreadsheets, and Keynes quote)
- Impostor Syndrome: "10,000 Qualified data scientists" Can you trust your professor at Berkley? Who are the ML Leads at big companies? Who are the IT consultants?
- Impostor Syndrome: What does MIT Say? A review of Managing Technical Professionals.
- Practice: "Head of Data" interview question, how fast can you spin up an environment? Remember your pandas functions
- 9 Reasons why you'll never be a data scientist
- Huge “foundation models” are turbo-charging AI progress
- Language Models: Past, Present, and Future
- Definitions: docker, container, ephemeral, bash
- History: SQL, what it is and why it's important (PowerBI, Tableau, Athena, BigQuery)
- Docker: Command line usage, flags, interactive mode and bash
- Docker in the cloud: How to think about the cloud, Big Providers (AWS, GCP, Azure) and Small (Linode, Oracle, etc...)
- Aside: What are Kaggle and Colab?
- Demonstration: Create a github project, spin up environment, run experiment, save python file, commit changes.
docker pull tensorflow/tensorflow:latest # Download latest stable image
docker run -it -p 8888:8888 tensorflow/tensorflow:latest-jupyter # Start Jupyter server
- Run the tensorflow tutorial notebooks for either
classification.ipynb
(if you want to practice image classification) ortext_classification.ipynb
and fit the sample models with the sample data.
- (this was previously assigned, but you MUST know it for this lecture) Command Line for Beginners
- if you do not have experience with SQL please take the Interactive SQL course @ Codecademy
- Read the first half of this article, once they get into deployment details you can skip. Why use Docker for Machine Learning Development?
- Docker Documentaton
- Docker on AWS, or the TLDR version
- Docker on GCP, or the TLDR version
- Docker on Azure
- Spotty
- Tensordock
- History: The FSF and Richard Stallman
- History: FOSS Products, Pixar, Planes, Servers
- History: Unix Family Tree and Linux Family Tree
- Aside: “A Generation Behind” - is it true? is it useful? What if the closed-source stuff eventually becomes free?
- FOSS in practice: calculendar and comcrawl
- Choosing Technologies: How to choose a technology and not stress about it. How to handle buy vs. build and this map
- Downloading Data, Examples: Common Crawl, Common Crawl 2, direct from webpages and CSV.
- What is your ideal dataset?
feature1, feature2, feature3, label
- Image classification examples: satellite-image-deep-learning, Go-Winner-Prediction, Mushroom Classifier
- Feel free to skim this, or read in detail any essay with a title that speaks to you. Free Software, Free Society
- Notes on The Software Paradox or The Software Paradox (Full Book)
- ETL: What is it and why do we need it?
- Demonstration: Numbers are Data
- Demonstration: Text is Data
- Demonstration: Images are Data
- Pandas: what is it and why do we use it?
- Discussion: Data Collection, ETL and "glue code"
- Setting up your project with the medusa-ml-template
- Pandas in Action or read the O'Reilly book ($ or free trial required)
- OR pandas on datacamp ($ required) or pandas on codecademy ($ required)
- OR Freecodecamp Pandas 10 hour course
- Scraping Data
- APIs
- Python Requests
- Combining datasets
- Demonstration: Simplest Text Classification
- Demonstration: Simplest Image Classification
- Ludwig
- Definition: Accuracy, Precision, Recall, F1, AUC
- Definition: Confusion Matrix... in Tensorflow too
- Discussion: Loss functions vs model metrics?
- Discussion: How do you measure model performance with other ML techniques? (Back to Excel Nerual Net for a moment) then Custom Loss Functions and Custom Loss Functions #2
- Discussion: "The Price is Right" Loss Function?
- Discussion: Layer Types and Standard or Template Models
- Discussion: Where to start, how to adjust hyperparameters
- Discussion: How can you steal ideas?
- When was the last time you wrote a custom neural net?
- Do you think there is a competitive future for smaller, locally trained/served models?
- What are the major general advances in ML techniques?
- Definitions: AI Ethics Big 3: Explainability, Bias, and Privacy
- Discussion: Who should die? Self-Driving trolley preblems.
- Discussion: I can predict criminality, should I?
- Discussion: Are biased models useful? When?
- Google Researcher Says She Was Fired Over Paper Highlighting Bias in A.I.
- Tesla’s ‘phantom braking’ problem is getting worse, and the US government has questions
- A.I. Is Not Sentient. Why Do People Say It Is?
- The Long Road to Driverless Trucks
- Stuck on the Streets of San Francisco in a Driverless Car
- Copilot and the AI Copyright wars
- "Driverless Cars" with Human Masters
- Demonstration: Tensorflow Lite, Tensorflow Serving
- Discussion: Predict is easy, train is hard (computationally)
- Demonstration: Docker + Flask
- Discussion: DevOps vs MLOps, what is special? what is the same?
Bootcampers will spend a tremendous time working on final projects that are targeted to the bootcamper's career goals. For an example final presentation see Oleh's Video (YouTube) and Oleh's Repository (GitHub).
- Deep Learning Illustrated and Deep Learning with Tensorflow, Pytorch etc..
- FreeCodeCamp ML Course in 10 hrs
- HuggingFace Course