• Personal certification page: https://www.datacamp.com/certificate/DS0017278428696
• 23 courses, 6 projects, 3 skill assessments
- Introduction to Python
Master the basics of data analysis in Python. Expand your skillset by learning scientific computing with Numpy. - Intermediate Python
Level up your data science skills by creating visualizations using Matplotlib and manipulating DataFrames with Pandas
PROJECT: Investigating Netflix Movies and Guest Stars in the Office - Data Manipulation with pandas
Use the world’s most popular Python data science package to manipulate data and calculate summary statistics.
PROJECT: The Android App Market on Google Play - Joining Data with Pandas
Learn to combine data from multiple tables by joining data together using pandas
PROJECT: The Github History of the Scala Language - Introduction to Data Visualization with Matplotlib
Learn how to create, customize, and share data visualizations using Matplotlib. - Introduction to Data Visualization with Seaborn
Learn how to create informative and attractive visualizations in Python using the Seaborn library. - Python Data Science Toolbox (Part 1)
Learn the art of writing your own function in Python, as well as key concepts like scoping and error handling. - Python Data Science Toobox (part 2)
Continue to build your modern Data Science skills by learning about iterators and list comprehensions. - Intermediate Data Visualization with Seaborn
Use Seaborn’s sophisticated visualization tools to make beautiful, informative visualizations with ease.
PROJECT: A Visual History of Nobel Prize Winners
SKILL ASSESSMENT: Data Manipulations with Python - Introduction to Importing Data in Python
Learn to import data into Python from various sources, such as Excel, SQL, SAS and right from the web. - Intermediate Importing Data in Python
Improve your Python data importing skills and learn to work with web and API date - Cleaning Data in Python
Learn to diagnose and treat dirty data and develop the skills needed to transform your raw data into accurate insights! - Working with Dates and Times in Python
Learn how to work with dates and times in Python
SKILL ASSESSMENT: Importing & Cleaning Data with Python - Writing Functions with Python
Learn to use best practices to write maintainable, reusable, complex function with good documentation.
SKILL ASSESSMENT: Python Programming - Exploratory Data Analysis in Python
Learn how to explore, visualize, and extract insights from data - Analyzing Police Activity with pandas
Explore the Standford Open Policing Project dataset and analyze the impact of gender on police behaviour using Pandas - Statistical Thinking in Python (Part 1)
Build the foundation you need to think statistically and to speak the language of your data. - Statistical Thinking in Python (Part 2)
Learn to perform the two key tasks in statistical inference: parameter estimation and hypothesis testing.
PROJECT: Dr. Semmelwels and the Discovery of Handwashing - Machine Learning with scikit-learn
Learn how to build and tune predictive models and evaluate how well they’ll perform on unseen data
PROJECT: Predicting Credit Card Approvals - Unsupervised Learning in Python
Learn how to cluster, transform, visualize, and extract insights from unlabelled datasets using scikit-learn and scipy. - Machine Learning Tree-Based Models in Python
In this course, you’ll learn how to use tree-based models and ensembles for regression and classification using scikit… - Case Study: School Budgeting with Machine Learning in Python
Learn how to build a model automatically classify items in a school budget. - Cluster Analysis in Python
In this course, you will be introduced to unsupervised learning through techniques such as hierarchical and k-means c…
• Four 40-minute timed assessments
- Coding for Production
- Statistical Experimentation
- Exploratory Analysis with PostgreSQL
- Model Development
• One coding challenge
• Final case study1
- Two parts:
- Technical report for data science manager
- Presentation for non-technical audience
- Problem type: Binary classification
- Work environment: Datacamp workspace
- Libraries/modules used: pandas, numpy, seaborn, matplotlib, BeautifulSoup, nltk, PorterStemmer,
TfidfVectorizer, WordCloud, operator, plotly, sklearn, time, imblearn, LogisticRegressionCV, RandomForestClassifier, classification_report, confusion_matrix, balanced_accuracy_score, matthews_corrcoef, geometric_mean_score, compute_class_weight, GridSearchCV - Workflow:
- Read CSV data: >40000 entries
- Data exploration
- Subset data for NLP analysis
- Data quality: duplicates/missing data/quality check e.g. mojibake
- Feature engineering e.g. combining non-numerical columns
- Text preprocessing (e.g. stemming, tokenization, TF-IDF)
- Data summary and visualizations e.g. word clouds, distribution
- Stratified train-test split
- Machine learning algorithms with class weights: compare metrics, select best, hyperparameter tuning (gridsearchcv + adjust decision threshold) to answer business question as well as achieve business success criteria.
- Summary/result/discussion: Findings, final model, metrics trade-off
- Recommendations for future work: SME involvement, better data quality e.g. well defined features, better data representativeness e.g. additional features, other methods e.g. deep learning
Footnotes
-
Restrictions on sharing as advised by DataCamp ↩