Skip to content
/ pydata Public

80 Best Data Science Books That Are Worthy Reading

Notifications You must be signed in to change notification settings

wjpsky/pydata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Data science is probably the most popular concept nowadays. I believe that many people are looking for an entrance to get inside the industry, and I just happened to read an article that lists some great data science books that may be helpful for you. So I concluded it in this article and I’ve also given the books brief introductions, so you can choose the ones you’d like to read. Some of the data science books you can find it online, and I've given out the links. But most of them I think you may need to find them on Amazon.

Part I: Data Scientist Core Skills

Data Science Math Probability and Statistics Machine Learning Data Mining SQL R Python Data Scientist Interview Algorithm Handbook Web Scraping and Data Wrangling Data Visualization and Storytelling A/B Testing Part II: Data Science Advanced Skills

Neural Network and Deep Learning Information Theory Causal Inference Sampling Convex Growth Analytics Text Mining and Natural Language Processing Anomaly Detection Recommender Systems Social Network Analysis Time Series Analysis and Forecasting Reinforcement Learning and Artificial Intelligence Part III: Leisure Reading

Part I: Data Scientist Core Skills

Data Science

  1. The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists

25 experts in the industry gave out some advice in this handbook, very helpful for starters.

  1. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.

  1. Doing Data Science: Straight Talk from the Frontline

In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.

Math

  1. Multivariate Calculus

https://ocw.mit.edu/courses/mathematics/18-02sc-multivariable-calculus-fall-2010/index.htm

  1. Linear Algebra

https://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/index.htm

Probability and Statistics

  1. Introduction to Probability, Statistics, and Random Processes

This book introduces students to probability, statistics, and stochastic processes. It can be used by both students and practitioners in engineering, various sciences, finance, and other related fields. It provides a clear and intuitive approach to these topics while maintaining mathematical accuracy. You can also find courses and videos online. https://www.probabilitycourse.com

  1. OpenIntro Statistics

The OpenIntro project was founded in 2009 to improve the quality and availability of education by producing exceptional books and teaching tools that are free to use and easy to modify. And whose inaugural effort is OpenIntro Statistics. Corresponding courses and videos can be found in: https://www.openintro.org

  1. Statistical Inference

It’s a textbook for fresh graduates in many colleges. Discusses both theoretical statistics and the practical applications of the theoretical developments. Includes a large number of exercises covering both theory and applications.

  1. Applied Linear Statistical Models

Applied Linear Statistical Models is the long established leading authoritative text and reference on statistical modeling. The Fifth edition provides an increased use of computing and graphical analysis throughout, without sacrificing concepts or rigor. In general, the 5e uses larger data sets in examples and exercises, and where methods can be automated within software without loss of understanding, it is so done.

  1. An Introduction to Generalized Linear Models

Contents summarized as the title. An introduction to generalized linear models.

  1. All of Statistics: A Concise Course in Statistical Inference

This book is for people who want to learn probability and statistics quickly. It is suitable for graduate or advanced undergraduate students in computer science, mathematics, statistics, and related disciplines.

  1. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science

Efron and Hastie gave us a comprehensive introduction to statistics in the big data era through this book.

  1. Statistics in a Nutshell: A Desktop Quick Reference

A quick reference as the title says

  1. Bayes' Rule: A Tutorial Introduction to Bayesian Analysis

  2. Think Bayes: Bayesian Statistics in Python

Briefly introduces how to use Python to do Bayesian Statistics http://www.greenteapress.com/thinkbayes/thinkbayes.pdf

  1. Bayesian Methods for Hackers

Advance tutorials on how to use Python to do Bayesian statistics https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

  1. Practical Statistics for Data Scientists: 50 Essential Concepts

This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not. You can find it here: https://github.com/andrewgbruce/statistics-for-data-scientists

Machine Learning

  1. An Introduction to Statistical Learning: with Applications in R

A good book no doubt, everyone in the field should have heard about it. http://www-bcf.usc.edu/~gareth/ISL/ https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about

  1. Applied Predictive Modeling

Applied Predictive Modeling covers the overall predictive modeling process. A must-read before interview or work.

  1. Python Machine Learning

Python Machine Learning Second Edition now includes the popular TensorFlow deep learning library. The scikit-learn code has also been fully updated to include recent improvements and additions to this versatile machine learning library.

  1. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies

A comprehensive introduction to the most important machine learning approaches used in predictive data analytics, covering both theoretical concepts and practical applications.

  1. Real-World Machine Learning

This book tells you how to use machine learning to solve real-world problems. Strongly recommend to all data scientists to read it before internship or work

  1. Learning From Data

Explained many machine learning theories that many books don’t mention, such as VC dimension. https://work.caltech.edu/telecourse.html

  1. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition

This book describes the important ideas in a variety of fields such as medicine, biology, finance, and marketing in a common conceptual framework. The great ESL, I think it is suitable for thumbing through and excerpting.

  1. Pattern Recognition and Machine Learning

The book presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It uses graphical models to describe probability distributions when no other books apply graphical models to machine learning.

Data Mining

  1. Principles of Data Mining

A basic introduction to Data mining, talks a lot about association rules.

  1. Introduction to Data Mining

Introduction to Data Mining presents fundamental concepts and algorithms for those learning data mining for the first time.

  1. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management

Uses practical examples to introduce how to use data mining to earn from customers.

SQL

  1. SQL Cookbook: Query Solutions and Techniques for Database Developers

This cookbook mentions lots of traps in SQL query, and it gives out every popular database’s query code.

R

  1. R in Action

The book begins by introducing the R language, including the development environment. Focusing on practical solutions, the book also offers a crash course in practical statistics and covers elegant methods for dealing with messy and incomplete data using features of R.

  1. R for Data Science

  2. R Packages

  3. Advanced R

Written by Professor Hadley Wickham. R for Data Science, with Garrett Grolemund, introduces the key tools for doing data science with R. R packages teaches good software engineering practices for R, using packages for bundling, documenting, and testing your code. Advanced R helps you master R as a programming language, teaching you what makes R tick.

Python

  1. Think Python

This hands-on guide takes you through the language a step at a time, beginning with basic programming concepts before moving on to functions, recursion, data structures, and object-oriented design. Suitable for beginners

  1. Fluent Python

Author Luciano Ramalho takes you through Python’s core language features and libraries, and shows you how to make your code shorter, faster, and more readable at the same time.

  1. Python for Probability, Statistics, and Machine Learning

This book covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules in these areas.

  1. Python Data Science Handbook

A very comprehensive handbook, tells about using Python to solve data science problems. https://github.com/jakevdp/PythonDataScienceHandbook

Data Scientist Interview

  1. Data Science Interviews Exposed

Data Science Interviews Exposed offers data science career advice and REAL interview questions to help you get the six-figures salary jobs!

  1. Cracking the PM Interview: How to Land a Product Manager Job in Technology

In U.S.A., many data scientists work closely related to products, even some of they are employed as product managers, so this book talking PM interview has its referential value to data scientists.

Algorithm

  1. Grokking Algorithms: An illustrated guide for programmers and other curious people

Grokking Algorithms is a fully illustrated, friendly guide that teaches you how to apply common algorithms to the practical problems you face every day as a programmer.

  1. Problem Solving with Algorithms and Data Structures Using Python

The study of algorithms and data structures is central to understanding what computer science is all about. And these are what this book all about. Electronic edition: http://interactivepython.org/runestone/static/pythonds/index.html

  1. Algorithms in a Nutshell: A Practical Guide

An algorithm guide for quick review.

Handbook

  1. The Data Science Handbook

A comprehensive overview of data science covering the analytics, programming, and business skills necessary to master the discipline

Web Scraping and Data Wrangling

  1. Web Scraping with Python: Collecting Data from the Modern Web

With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands—or even millions—of web pages at once. Actually, simply using Octoparse can fulfill your web scraping needs.

  1. Data Wrangling with Python: Tips and Tools to Make Your Life Easier

This book teaches you how to cleanse messy original data. Wrangle it into the way you want.

  1. Regular Expressions Cookbook

Though regular expressions are annoying, you have to face it. You can use this book to check up the regular expressions you want.

Data Visualization and Storytelling

  1. Communicating Data with Tableau: Designing, Developing, and Delivering Data Visualizations

This practical guide shows you how to use Tableau Software to convert raw data into compelling data visualizations that provide insight or allow viewers to explore the data for themselves.

  1. Interactive Data Visualization for the Web: An Introduction to Designing with D3

This fully updated and expanded second edition takes you through the fundamental concepts and methods of D3, the most powerful JavaScript library for expressing data visually in a web browser.

  1. Data Visualization with Python and JavaScript: Scrape, Clean, Explore & Transform Your Data

With this hands-on guide, author Kyran Dale teaches you how build a basic dataviz toolchain with best-of-breed Python and JavaScript libraries—including Scrapy, Matplotlib, Pandas, Flask, and D3—for crafting engaging, browser-based visualizations.

  1. Storytelling with Data: A Data Visualization Guide for Business Professionals

This book demonstrates how to go beyond conventional tools to reach the root of your data, and how to use your data to create an engaging, informative, compelling story.

A/B Testing

  1. A / B Testing: The Most Powerful Way to Turn Clicks Into Customers

  2. Designing with Data: Improving the User Experience with A/B Testing

Part II: Data Science Advanced Skills

This part of books is recommended for those who are wishing to become a Saiyan among data scientists.

Neural Network and Deep Learning

  1. Make Your Own Neural Network

A step-by-step gentle journey through the mathematics of neural networks, and making your own using the Python computer language.This guide will take you on a fun and unhurried journey, starting from very simple ideas, and gradually building up an understanding of how neural networks work.

  1. Deep Learning

An introduction to a broad range of topics in deep learning, covering mathematical and conceptual background, deep learning techniques used in industry, and research perspectives.

  1. Hands-On Machine Learning with Scikit-Learn and TensorFlow

This practical book shows you how to use simple and efficient tools to implement programs capable of learning from data.

Information Theory

  1. Data Science and Information Theory This is an article that introduces the importance of Information Theory in data science field.

  2. Information Theory: A Tutorial Introduction

In this richly illustrated book, accessible examples are used to introduce information theory in terms of everyday games like ‘20 questions’ before more advanced topics are explored.

  1. Information, Entropy, Life and the Universe: What We Know and What We Do Not Know

If you are interested in exploring the world of Information, Entropy and Probability or just the world in general this is a great place to start. Arieh takes the reader through a detailed unfolding of these topics while providing numerous common examples to help with these sometimes difficult to grasp topics

Causal Inference

  1. Causal Inference in Statistics: A Primer

Judea Pearl presents a book ideal for beginners in statistics, providing a comprehensive introduction to the field of causality.

  1. Field Experiments: Design, Analysis, and Interpretation

A brief, authoritative introduction to field experimentation in the social sciences.

Sampling

  1. Sampling

Sampling provides an up-to-date treatment of both classical and modern sampling design and estimation methods, along with sampling methods for rare, clustered, and hard-to-detect populations.

Convex

  1. Convex Optimization

A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency.

Growth Analytics

  1. Lean Analytics: Use Data to Build a Better Startup Faster (Lean Series)

Written by Alistair Croll (Coradiant, CloudOps, Startupfest) and Ben Yoskovitz (Year One Labs, GoInstant), the book lays out practical, proven steps to take your startup from initial idea to product/market fit and beyond.

  1. Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity

Web Analytics 2.0 provides specific recommendations for creating an actionable strategy, applying analytical techniques correctly, solving challenges such as measuring social media and multichannel campaigns, achieving optimal success by leveraging experimentation, and employing tactics for truly listening to your customers.

Text Mining And Natural Language Processing

  1. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit

This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation. Read online: http://www.nltk.org/book/

  1. Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data

Text Analytics with Python teaches you the techniques related to natural language processing and text analytics, and you will gain the skills to know which technique is best suited to solve a particular problem.

  1. Introduction to Information Retrieval

Class-tested and coherent, this groundbreaking new textbook teaches web-era information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. Read online: https://nlp.stanford.edu/IR-book/

Anomaly Detection

  1. Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection

Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques is an authoritative guidebook for setting up a comprehensive fraud detection analytics solution.

  1. Outlier Analysis

This book provides comprehensive coverage of the field of outlier analysis from a computer science point of view. It integrates methods from data mining, machine learning, and statistics within the computational framework and therefore appeals to multiple communities.

Recommender Systems

  1. Recommender Systems: The Textbook

This book comprehensively covers the topic of recommender systems, which provide personalized recommendations of products or services to users based on their previous searches or purchases.

Social network analysis

  1. Network Science

This pioneering textbook, spanning a wide range of topics from physics to computer science, engineering, economics and the social sciences, introduces network science to an interdisciplinary audience.

  1. Social and Economic Networks

In Social and Economic Networks, Matthew Jackson offers a comprehensive introduction to social and economic networks, drawing on the latest findings in economics, sociology, computer science, physics, and mathematics.

  1. Social Network Analysis for Startups: Finding connections on the social web

You'll learn concepts and techniques for recognizing patterns in social media, political groups, companies, cultural trends, and interpersonal networks.

Time Series Analysis and Forecasting

  1. Practical Time Series Forecasting with R: A Hands-On Guide

The book introduces popular forecasting methods and approaches used in a variety of business applications. The book offers clear explanations, practical examples, and end-of-chapter exercises and cases.

  1. Forecasting: principles and practice

This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly.

Reinforcement Learning and Artificial Intelligence

  1. Reinforcement Learning: An Introduction

Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications.

  1. Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach, 3e offers the most comprehensive, up-to-date introduction to the theory and practice of artificial intelligence. Number one in its field, this textbook is ideal for one or two-semester, undergraduate or graduate-level courses in Artificial Intelligence.

Part III: Leisure Reading

  1. Soft Skills: The software developer's life manual

Soft Skills: The software developer's life manual is a unique guide, offering techniques and practices for a more satisfying life as a professional software developer.

  1. The Healthy Programmer: Get Fit, Feel Better, and Keep Coding

This is an excellent book for any professional who sits too much for the job. It contains informative suggestions to improve your health in ways that fit into your busy day. What makes this book different is its practical suggestions which fit into the hectic lifestyle.

  1. Exposing the Magic of Design

This book offers a way of thinking about complicated, multifaceted problems with a repeatable degree of success. Design synthesis methods can be applied in business to produce new and compelling products and services, or these methods can be applied in government with the goal of changing culture and bettering society.

  1. Thinking, Fast and Slow

The book has about 3k reviews in Amazon. No certain description was given, but I believe it’s a great and interesting book for all people.

  1. Naked Statistics: Stripping the Dread from the Data

Perhaps the most interesting statistics textbook you’d have ever read.

  1. Uncertainty: The Soul of Modeling, Probability & Statistics

This book presents a philosophical approach to probability and probabilistic thinking, considering the underpinnings of probabilistic reasoning and modeling, which effectively underlie everything in data science.

Source: Octoparse

More related sources:

Top 30 Big Data Tools for Data Analysis

Top 8 Technology Trends for 2018 You Must Know

Top 30 Process Automation Tools for 2018

Why we need data service?

Top 30 Free Web Scraping Software

Big Data: 70 Amazing Free Data Sources You Should Know for 2017

About

80 Best Data Science Books That Are Worthy Reading

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published