Skip to content

Latest commit

 

History

History
18 lines (10 loc) · 2.43 KB

intro.md

File metadata and controls

18 lines (10 loc) · 2.43 KB

Introduction to PEDS

This book's aim is to teach you about data science using practical examples. It presents a series of notebooks that walk you through a typical data science workflow. We’ll start off with notebooks that are completely filled in, and as you progress there will be more and more missing sections where you’ll have to fill in code. Don’t worry all the code and concepts you need will be in the earlier notebooks. With this approach you can build up your coding skills and learn how to do data science at the same time.

The book is laid out in 5 parts, following a typical data science workflow (Fig. 1). We’ll start off learning how to use colaboratory to build notebooks in python, then move on to how to work with data in python. Once we are able to load, clean, and wrangle data, we’ll start looking at ways to explore that data. In particular we’ll learn how to visualize the data, how to use descriptive statistics and correlation, and we’ll learn some unsupervised clustering algorithms. Once we’ve learnt how to explore our datasets, we’ll learn how to build supervised machine learning models to answer specific questions and make specific predictions. Finally, we’ll dive into some of the nuances about how to interpret these supervised learning models, and how we can best communicate them to non-specialists.

Figure 1: Data science workflow we will use in this book

If this book is being used with a course there are some extras that might be useful! There are three class projects, each building on the last. In these projects you will apply what you're learning in class to a dataset that you find interesting. A good place to look for data is kaggle.com/datasets, or if you have some already in mind that is great! By applying what we are learning in this book to a question or data set you care about you’ll hopefully get a good sense of how useful some of these methods really are in practice. Finally, there are also some class data challenges that are meant to get you working in groups and trying out some algorithms we will learn in the class to meet some challenge! These challenges will rely on collaboration between individuals and groups (e.g., using Slack). Though, if you are completing this book on your own you can still give them a try!