## Build Your Data Science Project
In this capstone project, you will leverage what you’ve learned throughout the program to build a data science project of your choosing. **Your project deliverables are**:

1. A Github repository of your work.
    * The repository must have a README.md file that communicates the libraries used, the motivation for the project, the files in the repository with a small description of each, a summary of the results.
2. A blog (or other media for a write-up) post written for a technical audience, or a deployed web application powered by data.

You'll follow the steps of the data science process that we've discussed:

1. You will first define the problem you want to solve and investigate potential solutions.
2. Next, you will analyze the problem through visualizations and data exploration to have a better understanding of what algorithms and features are appropriate for solving it.
3. You will then implement the algorithms and metrics of your choice, documenting the preprocessing, refinement, and post-processing steps along the way.
4. Afterward, you will collect results about your findings, visualize significant quantities, validate/justify your results, and make any concluding remarks about whether your implementation adequately solves the problem.
5. You can choose to: (i) construct a blog post (or another medium for a write-up) to document all of the steps from start to finish of your project, or (ii) deploy your results into a web application.

## Capstone Project Report Structure

Writing a clear, concise, and well-structured Data Science report is a critical step to communicate and share your key findings and processes with your peers. **If you choose to provide a blog post (or another medium), your project report should have the following structure**:

1. **Section 1: Project Definition**
    * `Project Overview`: state the high-level overview of the project, including the background information such as problem domain, project origin, and related data sets or input data.
    * `Problem Statement`: define the problem to be solved.
    * `Metrics`: define the metrics to measure the results and justifications to use the metrics. For example, if you use time-series data sets, what metrics will be appropriate to measure the results.

2. **Section 2: Analysis**
    * `Data Exploration`: describe the data sets, including the features, data distributions, and descriptive statistics. Identify any abnormalities or specific characteristics inherent in the data sets.
    * `Data Visualization`: build data visualization based on the data exploration in the previous step.

3. **Section 3: Methodology**
    * `Data Preprocessing`: describe the steps taken to preprocess the data and address any abnormalities in the data sets. If data preprocessing is not needed, please explain why.
    * `Implementation`: discuss the process using the models, algorithms, and techniques applied to solve the problem. Any complications during the implementation should be mentioned.
    * `Refinement`: describe the process to refine the algorithms and techniques, such as using cross-validation or changing the parameter settings.

4. **Section 4: Results**
    * `Model Evaluation and Validation`: discuss the models and parameters used in the methodology. If no model is used, students can discuss the methodology using data visualizations and other means.
    * `Justification`: discuss the final results in detail and explain why some models, parameters, or techniques perform better over others. Show and compare the results in tabular forms or charts.

5. **Section 5: Conclusion**
    * `Reflection`: summarize the end-to-end problem solution and discuss one or two particular aspects that you find interesting or difficult to implement.
    * `Improvement`: provide suggestions for the next research to improve the experiment.

Capstone project report examples:

* [Plot and Navigate a Virtual Maze](https://github.com/udacity/machine-learning/blob/master/projects/capstone/report-example-3.pdf)
* [Vision Loss](https://github.com/udacity/machine-learning/blob/master/projects/capstone/report-example-1.pdf)

(Optional) further readings:

* [Elements of a Scientific Report](https://www.waikato.ac.nz/library/guidance/guides/write-scientific-reports)
* [Components of a Scientific Report](https://canvas.hull.ac.uk/courses/370/pages/components-of-a-scientific-report)

## Anatomy of a README.md Document

A `readme` file contains relevant information about the files in a project's directory. A `readme` should provide just enough context to get other users up and running with your code. **Keep in mind, we are writing README for other users**.

* Start with a `title` and a `description`: be sure to capture the spirit of your project clearly and concisely. This will help frame the reader's experience when going through your documentation.
* Next, include any information that is absolutely necessary for understanding your code. This may be `dependencies` on other software or libraries, `installation instructions`, `common usage`, or `known bugs`.
* You can also add other information, such as about the `author` and `software license`

(Optional) Additional readings:
* Udacity free course on [Writing READMEs](https://www.udacity.com/course/writing-readmes--ud777)
* [How to Write a Good README File for Your GitHub Project](https://www.freecodecamp.org/news/how-to-write-a-good-readme-file/) from FreeCodeCamp article

## Setting Yourself Apart

An important part of landing a job or advancing your career as a data scientist is setting yourself apart through impressive data science projects. By now, you've completed several guided projects, and now's your chance to show off your skills and creativity. You'll receive a review of your project with feedback from a Udacity mentor, and they will focus on how your project demonstrates your skills as a well-rounded data scientist.

This project is designed to prepare you for delivering a polished, end-to-end solution report of a real-world problem in a field of interest. When developing new technology, or deriving adaptations of previous technology, properly documenting your process is critical for both validating and replicating your results.

Things you will learn by completing this project:

* How to research and investigate a real-world problem of interest.
* How to accurately apply specific data science algorithms and techniques.
* How to properly analyze and visualize your data and results for validity.
* How to document and write a report of your work.